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SUBSET  SELECTION  PROCEDURES:  A  REVIEW 
AND  AN  ASSESSMENT* 

Shanti  S.  Gupta  and  S.  Panchapakesan 

Purdue  University  Southern  Illinois  University 

1.  INTRODUCTION 

It  is  well  over  three  decades  since  statistical  inference  problems 
were  first  posed  in  the  now  familiar  selection  and  ranking  framework. 

More  than  700  papers  have  been  published  over  these  years  in  journals 
and  proceedings  of  international  conferences.  During  the  last  fifteen 
years,  five  books  and  a  categorized  bibliography  have  been  published. 

Starting  with  a  handful  of  researchers  in  the  fifties,  the  area  of  selection 
and  ranking  procedures  has  gained  the  attention  of  numerous  active  researchers 
today. 

Selection  and  ranking  problems  have  generally  been  studied  using  either 

the  indifference-zone  approach  of  Bechhofer  (1954)  or  the  so-called  subset 

selection  approach  due  mainly  to  Gupta  (1956).  A  comprehensive  survey  of 

significant  contributions  using  these  two  approaches  covering  a  span  of 

almost  thirty  years  is  given  in  Gupta  and  Panchapakesan  (1979).  The  present 

paper  is  mainly  concerned  with  the  subset  selection  approach.  Our  aim  is 

to  provide  a  historical  perspective,  trace  the  major  developments  that  took 

place  in  the  subset  selection  theory  over  the  years  1950-1980  divided  into 

three  periods,  indicate  the  recent  trends,  and  discuss  the  impact  of 

the  research  in  this  area,  and  the  directions  for  future  research.  In 

doing  so,  we  will  not  be  concerned  with  details  of  the  several  procedures 

but  only  with  the  nature  and  the  trend  of  the  developments  in  each  period. 

*ThTs— research  was  supported  by  the  Office  of  Naval  Research  Contract 
N00014-75-C-0455  at  Purdue  University.  Reproduction  in  whole  or  in 
part  is  permitted  for  any  purpose  of  the  United  States  Government. 
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The  periods  themselves  serve  more  as  a  general  reference  to  the  periods  of 
several  phases  of  growth  of  the  theory  rather  than  as  precise  partition  of 
the  periods  of  several  phases  of  growth  of  the  theory  rather  than  as  precise 
partition  of  the  entire  period. 

2.  HISTORICAL  PERSPECTIVE 

In  many  practical  situations,  the  experimenter  is  faced  with  the 
problem  of  comparing  k  (_>  2)  populations.  These  may,  for  example,  represent 
different  varieties  of  wheat  in  an  agricultural  experiment,  or  different 
competing  coherent  systems  in  engineering  models,  or  different  drugs  for  a 
certain  ailment.  In  all  these  problems,  each  population  is  characterized 
by  the  value  of  a  parameter  e.  In  the  above-mentioned  examples,  this 
parameter  e  may  be  the  average  yield  of  a  variety  of  wheat,  or  the  reliability 
function  of  a  system,  or  an  appropriate  measure  of  the  effectiveness  of  a 
drug. 

The  classical  approach  in  the  preceding  situations  has  been  to  test  the 
so-called  homogeneity  hypothesis  Hq  that  e-j  =  ...=  8^ ,  where  e^,...,©^  are  the 
(unknown)  values  of  the  parameter  e  for  the  k  populations.  If  the  populations 
are  assumed  to  be  normal  with  means  0, ,...,6^  and  a  common  unknown  variance 
a  (which  is  a  nuisance  parameter),  we  have  the  familiar  one-way  classification 
model  and  the  test  can  be  carried  out  using  Fisher's  analysis  of  variance 
technique.  However,  this  usually  does  not  serve  the  experimenter's  real  purpose 
which  is  not  just  to  accept  or  reject  the  homogeneity  hypothesis.  The  real 
goal  often  is  to  identify  the  best  population  (the  variety  with  the  largest 
average  yield,  the  most  reliable  system  and  so  on).  As  Bechhofer  noted  in  his 
now  classical  1954  paper,  the  deficiencies  of  the  ANOVA  'do  not  lie  in  the 
design  aspects  of  the  procedure  but  rather  in  the  types  of  decisions  which  are 
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made  on  the  basis  of  the  data1.  Of  course  it  was  recognized  (see  Cochran  and 
Cox,  1950,  p.  5) that  the  hypothesis  that  there  is  no  difference  between 
different  treatments  is  unrealistic  and  that  the  real  problem  is  to  obtain 
estimates  of  the  sizes  of  the  differences  between  the  treatments.  However, 
the  method  of  estimating  the  sizes  of  differences  was  often  used  as  an 
indirect  way  of  attempting  to  reach  the  goal  of  finding  the  best  treatment 
or  treatments.  The  attempts  to  formulate  the  decision  problem  to  answer  this 
realistic  goal  set  the  stage  for  the  development  of  the  selection  and  ranking 
theory. 

The  two  main  approaches  that  have  been  used  in  formulating  a  selection 
and  ranking  problem  are  familiarly  known  as  the  indifference  zone  approach 


and  the  subset  selection  approach.  Suppose  there  are  k  populations  n. 


•  •  »TT| 


where  u.  is  characterized  by  the  distribution  function  Fn  ,  i  =  l,...,k,  where 

i  0i 

e.j  is  a  real-valued  parameter  with  a  value  in  the  set  0.  It  is  assumed  that 
the  are  unknown.  Let  us  denote  the  ordered  by  £  e^]  1-  •  •£ 
and  the  (unknown)  population  associated  with  Gr^ -j  by  i  =  l,...,k. 

The  populations  are  ranked  according  to  their  e-values.  To  be  specific, 
tt ^ j  is  defined  to  be  better  than  ('"(■j)  ■<  ^(j))  ^  1  <  J  (That  is, 

6r.,  £  The  experimenter  is  presumed  to  have  no  prior  information 

regarding  the  true  pairing  between  (0-j , . . .  ,9^)  and  (g^  j,  . . .  ,6^)  •  The 
basic  problem  in  the  indifference  zone  approach  is  to  select  one  of  the  k 
populations  with  a  guarantee  that  the  probability  of  selecting  the  best 
population,  called  the  probability  of  a  correct  selection  (PCS),  is  at 
least  P*  (1/k  <  P*  <  1)  whenever  6(0[|<]’e[|c_i])  1  6*>  here  6^o[k],0[k-l]^  is 
an  appropriate  measure  of  the  separation  of  the  best  population  and  The 

next  best  population  The  constants  P*  and  6*  are  specified  by  the 

experimenter  in  advance.  The  statistical  problem  is  to  define  a  selection 
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rule  which  really  contains  three  parts:  sampling  rule,  stopping  rule  for 
sampling  and  decision  rule.  If  the  rule  is  based  on  a  single  sample  of 
fixed  size  n  from  each  population,  then  the  minimum  value  of  nis  determined  so  that 
the  specified  minimum  PCS  can  be  guaranteed.  As  we  stated  above,  this 
guarantee  is  to  be  met  when  e  =  ( e-j » . . .  ,©k )  belongs  to  a  part  of  the 
parameter  space  fi,  namely,  =  (e:  6(e[|<]’e[|<_-|])  1  <$*}.  The  region 

is  called  the  preference  zone.  It  should  be  noted  that  no  requirement 
is  made  of  the  PCS  when  e  belongs  to  a  certain  part  of  n.  It  is  this  fact 
that  led  to  the  original  label  of  'indifference  zone'  approach.  There  are 
several  variations  and  generalizations  of  the  basic  goal  discussed  above. 

For  details,  reference  can  be  made  to  Gupta  and  Panchapakesan  (1979). 

In  the  subset  selection  approach  for  selecting  the  best  population 
the  goal  is  to  select  a  nonempty  subset  of  the  k  populations  so  that  the 
best  population  is  included  in  the  selected  subset  with  a  minimum  guaranteed 
probability  P*(^  <  P*  <  1).  Here  the  size  of  the  selected  subset  is  not 
determined  in  advance  but  by  the  data  themselves.  Selection  of  any  subset 
consistent  with  the  goal  (here  selecting  the  best  population)  is  called  a 
correct  selection  (CS)  and  the  probability  of  a  correct  selection  using  a 
rule  R  is  denoted  by  P ( CS | R ) .  The  requirement  that 

P ( CS | R)  >  P*  (1) 

is  referred  to  as  the  basic  probability  requirement  or  the  P*-condition. 

Denoting  the  (random)  selected  subset  by  S,  the  requirement  (1)  can 
be  written  in  the  form 

Pr(S  3  1T(k))  >  P*  (2) 

which  brings  out  its  similarity  to  the  probability  statement  associated  with 
a  confidence  interval  procedure.  While  P*  corresponds  to  the  confidence 
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coefficient,  the  size  of  S,  denoted  by  |S|,  corresponds  to  the  'length'  of 
the  confidence  interval.  Thus  any  subset  selection  rule  for  'constructing'  S 
meets  the  criterion  of  validity  by  satisfying  (1)  or  (2)  and  |S|  serves  as  a 
measure  of  the  sensitivity  or  performance  of  the  rule.  It  should  also  be 
emphasized  that  in  the  subset  selection  framework  there  is  no  indifference 
zone  specification;  the  validity  criterion  or  the  P*-condition  must  be 
satisfied  whatever  be  the  configuration  of  the  unknown  parameters.  The 
configuration  of  the  parameters  which  yields  the  infimum  of  the  probability 
of  a  correct  selection  (PCS)  is  referred  to  as  the  least  favorable  configura¬ 
tion  (LFC) . 

Besides  being  a  goal  in  itself,  selecting  a  subset  containing  the  best 
can  also  serve  as  a  first-stage  screening  in  a  two-stage  procedure  designed 
to  choose  one  population  as  the  best;  see,  for  example,  Alam  (1970),  and 
Tamhane  and  Bechhofer  (1977). 

To  point  out  some  other  differences  between  the  indifference  zone  and 

the  subset  selection  approaches,  consider  the  problem  of  selecting  the 

population  associated  with  the  largest  mean  from  k  normal  populations 

2  2 

with  unknown  means  e^,...^  and  a  common  variance  a  .  When  a  is  known, 
Bechhofer  (1954)  proposed  a  single  stage  procedure  based  on  samples  of  size  n 

p 

each  from  the  k  populations.  When  a  is  not  known,  a  two-stage  procedure 
is  necessary  to  guarantee  the  probability  requirement  using  the  indifference 
zone  approach.  On  the  other  hand,  one  can  solve  the  problem  by  single  stage 
procedures  for  both  cases  in  the  subset  selection  approach.  Also  the  subset 
approach  can  be  used  when  the  sample  size  n  >_  2  has  already  been  chosen 
without  regard  to  the  type  of  analysis  to  be  used  for  the  data. 

Besides  the  problem  of  selecting  the  best  of  k  given  populations,  another 
problem  that  has  been  investigated  from  the  early  period  is  that  of  comparing  k 


6 


experimental  treatments  (populations)  with  a  standard  or  a  control  treatment. 

The  goal  is  to  select  a  subset  of  the  experimental  treatments  that  contains 
all  treatments  that  are  better  than  the  standard  or  the  control. 

3.  EARLY  DEVELOPMENTS  (1950-1965) 

Early  investigations  of  subset  selection  rules  predictably  centered 

around  well-known  parametric  families  of  distributions,  namely,  normal, 

binomial  and  gamma.  Gupta  (1956)  considered  a  procedure  for  selecting  the 

population  with  the  largest  mean  from  k  normal  populations  with  means 

o 

and  a  common  variance  a  .  He  considered  the  case  of  known  as  well  as  unknown 
a2.  Based  on  samples  of  size  n  from  these  populations,  his  rule  in  the  case 
of  known  a2  is 

R, :  Select  -rr.  if  and  only  if  X.  _>  max  X-  -  — ,  (3) 

1  1  1  l<j<k  J  v'n 

where  X^  is  the  mean  of  the  sample  from  it..,  i  =  l,...,n,  and  d  >  0  is  the  smallest 

constant  such  that  the  probability  requirement  (1)  is  satisfied.  The  smallest 

constant  d  satisfying  the  requirement  is  given  by 

inf  P ( CS | R)  =  P*  (4) 

Q 

2 

where  n  denotes  the  parametric  space.  When  a  is  not  known,  the  rule  Rg  of 
Gupta  (1956)  is  of  the  same  form  as  R-j  except  that  a  is  replaced  by  s,  where 

o  2 

s  is  the  usual  pooled  unbiased  estimator  of  a  .  Of  course,  the  constant  d 
will  have  a  different  value  now. 

For  selecting  the  population  with  the  largest  scale  parameter  from  k 
gamma  populations  with  (unknown)  scale  parameters  ©i,...,©^  and  a  common  known 
shape  parameter  v,  Gupta  (1963)  proposed  the  rule 
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R  •  Select  ir .  if  and  only  if  X.  •>  c  max  X-  (5) 

J  1  1  l<_j<k  J 

where  jL  is  the  mean  of  a  random  sample  of  size  n  from  TT.j ,  i  =  l,...,n, 

and  c  €  (0,1)  is  to  be  determined  to  satisfy  the  P*- requirement. 

The  rules  such  as  ,  R2>  and  R3  are  all  referred  to  as  Gupta's 

maximum  type  rules.  Of  course,  these  have  their  counterparts  for  the 

problem  of  selecting  the  population  with  the  smallest  parameter  of  interest. 

These  maximum  type  rules  have  been  investigated  extensively  in  the  literature; 

their  optimal  properties  have  also  been  studied. 

As  opposed  to  the  maximum  type  rules,  average  type  rules  were  proposed  by 

Seal  (1955,  1957,  1958a).  In  the  case  of  selecting  the  normal  population  with 

2 

the  largest  mean  when  the  common  variance  a  is  unknown,  the  average-type 
rule  is 


R^ :  Select  it.,  if  and  only  if 


k-1 


*i  i  i 


r=  1 


cr  *[r]  "  st’ 


(6) 


where  XrU  <_  Xr^j  X^-j-j  denote  the  ordered  sample  means  after  deleting 

?  2 

X.  (i  =  !,..., k),  s  is  the  usual  pooled  unbiased  estimator  of  a  ,  c^ , . . .  ,c)<_-| 

k — 1 

are  nonnegative  constants  subject  to  the  constraint  ][  c .  =  1 ,  and 

i=l 


t  =  t(k,P*,c-| 


?  •  •  •  5  C 


k-1 


)  is  chosen  to  satisfy  the  P*-requirement.  However, 


as  we  will  discuss  later,  the  maximum  type  rules  are  found  to  be  approximately 
Bayes  optimal  under  reasonable  loss  functions.  The  additional  simplicity  in 
determining  the  constants  associated  with  these  rules  makes  them  more  appealing 
and  useful . 

The  initial  investigations  of  the  rules  for  normal  means,  normal  variances 
and  gamma  scale  parameters  were  concerned  with  derivations  of  the  properties  of 
the  rules  such  as  monotonicity  and  of  results  relating  to  the  supremum  of  the 
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expected  size  of  the  selected  subset  for  these  specific  distributions. 

The  first  paper  in  the  direction  of  a  unified  treatment  was  by  Gupta  (1965) 
who  treated  selection  in  terms  location  and  scale  parameters.  It  was  assumed 
that  the  selection  statistics  used  in  the  rule  have  distributions  differing  in 
a  location  or  a  scale  parameter.  Let  T..  be  the  statistic  associated  with  the 
sample  from  ir.. ,  i  =  l,...,k.  Then  the  distributions  of  the  T..  are  F(x-e^), 
i  =  l,...,k,  or  F(x/e.),  i  =  l,...,k,  where  the  are  the  parameters  that 
are  to  be  ranked.  The  rules  investigated  by  Gupta  (1965)  are  R,-  (location  case) 
and  Rg  (scale  case)  given  below. 

Rc:  Select  ir.  if  and  only  if 

b  l 

T.  >_  max  T.-d  (7) 

1  l<j<k  J 

and 

Rc:  Select  77.  if  and  only  if 

6  i  J 

T.  >  c  max  T .  (8) 

1  “  l<j<k  J 

where  d  >  0  and  c  £  (0,1)  are  to  be  determined  so  that  the  P*-requi rement 
is  satisfied. 

Gupta  (1965)  showed  that  the  infimum  of  the  PCS  is  attained  in  either 
case  when  the  parameters  are  equal  and  this  infimum  is  independent  of  their 
common  value.  He  also  established  some  important  properties  that  are  enjoyed 
by  both  procedures.  These  are: 

(1)  The  procedures  are  monotone,  i.e.,  for  0.  >  0.,  the  probability  of 

*  J 

including  ir.  in  the  selected  subset  is  at  least  as  large  as  that  of  including 

IT  .  . 

J 

(2)  The  probability  of  selecting  the  best  population  in  the  selected 
subset  of  size  |S|  (not  known  in  advance)  is  maximum  among  all  possible  subsets 
of  size  j S | . 
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(3)  If  the  density  f(x,e)  possesses  a  monotone  likelihood  ratio  in  x, 
then  the  E(|S|)  is  maximized  over  all  parametric  configurations  when  the  ei  are 
equal  and  this  maximum  is  kP*. 

For  selecting  the  binomial  population  with  the  largest  success  probability, 
Gupta  and  Sobel  (1960)  proposed  a  location  type  rule.  Let  Xi  be  the  number  of 
successes  in  n  trials  associated  with  ,  i  =  l,...,k.  Their  rule  is 

R^:  Select  ir^  if  and  only  if 

X.  _>  max  X.  -  d  (9) 

1  l£0£k  J 

where  d  is  the  smallest  nonnegative  integer  for  which  the  P*- requirement  is  met. 

An  interesting  aspect  of  this  procedure  R7  is  that  the  infimum  of  the 
PCS  occurs  when  all  the  parameters  are  equal  but  it  is  not  independent  of 
their  common  value,  say,  p.  For  k  =  2,  Gupta  and  Sobel  (1960)  showed  that 
the  infimum  of  PCS  over  p  occurs  when  p  =  i.  When  k  >  2,  the  common  value  p 
for  which  this  infimum  takes  place  is  not  known.  However,  it  is  known  that 
this  common  value  p  j  as  n  -»•<».  This  difficulty  regarding  the  infimum  of 
the  PCS  led  to  the  investigations  of  conditional  selection  rules  which  will 
be  discussed  in  the  next  section. 

The  investigations  of  these  early  period  were  mainly  under  the  assumption 
that  the  sample  sizes  are  equal  and  that  the  nuisance  parameters  (such  as 

p 

for  the  normal  means  problem)  are  equal. 

Besides  the  problem  of  selecting  the  best  of  k  given  populations,  procedures 
were  proposed  and  investigated  also  for  the  problem  of  selecting  a  subset 
containing  all  the  populations  that  are  better  than  a  control,  and  that  of 
partitioning  a  set  of  populations  with  respect  to  a  control.  The  early 
contributors  are  Bhattacharya  (1956),  Gupta  and  Sobel  (1958),  and  Seal  (1958b). 
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4.  YEARS  OF  MAIN  GROWTH  (1965-1975) 

This  period  witnessed  a  very  significant  growth  of  the  ranking  and 
selection  theory,  in  general,  and  the  subset  selection  theory,  in  particular. 

This  period  also  marks  the  advent  of  the  'second  generation'  of  researchers 
coming  mostly  out  of  Cornell  University,  University  of  Minnesota  and  Purdue 
University.  The  research  during  this  period  encompassed  many  facets  of  the 
subset  selection  theory.  The  main  developments  during  this  period  can  be 
broadly  categorized  into  (i)  unified  results  for  the  existing  theory,  (ii) 
generalizations  and  modifications  in  the  formulation  of  the  problem  and  the 
goal,  (iii)  decision- theoretic  formulations,  Bayes  and  empirical  Bayes  procedures, 

(iv)  selection  procedures  for  multivariate  normal  and  multinomial  populations, 

(v)  development  of  conditional  procedures,  (vi)  nonparametri c  procedures, 

(vii)  selection  from  restricted  families  and  (viii)  sequential  procedures. 

As  one  can  see,  many  of  the  developments  that  took  place  in  the  theory  had 
their  beginnings  in  this  period.  We  will  discuss  these  briefly.  For  more 
details  on  these  results,  the  reader  is  referred  to  Gupta  and 
Panchapakesan  (1979). 

4 . 1  Unified  Theory 

In  Section  3,  we  referred  to  Gupta  (1965)  who  presented  unified  results 
for  location  and  scale  parameter  families.  Later,  these  results  were  given 
in  a  more  general  form  by  Gupta  (1966).  This  was  followed  by  a  more  compre¬ 
hensive  unified  theory  by  Gupta  and  Panchapakesan  (1972).  Let  have 

absolutely  continuous  distributions  F.  ,...,F  ,  respectively,  where  the  0. 

el  ek  1 

belong  to  an  open  interval  ©  of  the  real  line.  It  is  assumed  that  {FQ}, 

e  6  9,  is  a  stochastically  increasing  family  in  e.  Let  h  =  hc  c  6  [1 ,°°) , 

d  6  [0,oo)  be  a  class  of  real-valued  functions  defined  on  the  real  line  satisfying 

the  following  conditions:  For  every  x  belonging  to  the  support  of  F  , 
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(i)  hc  d(x)  >_  x,  (ii)  h-|  Q(x)  =  x,  (iii)  hc  d(x)  is  continuous  in  c  and  d,  and 

(iv)  lim  h  ,(x)  =  c  fixed,  and/or  lim  h  ,(x)  =  d  fixed,  x  4  0. 
d-wo  c’d  c-*°  ’ 

Using  the  above  class  of  functions  h,  Gupta  and  Panchapakesan  (1972) 
considered  the  following  class  of  procedures  whose  typical  member  is  denoted  by 


R^:  Select  the  population  if  and  only  if 

h(x. )  _>  max  x. ,  (10) 

1  l<j<k  J 

where  xi  is  an  observation  from  ,  i  =  l,...,k.  The  PCS  is  minimized  when 
e-j  =...=  ek  =  e.  In  general,  the  value  of  the  PCS  depends  on  e.  Under  certain 
regularity  conditions  (see  Gupta  and  Panchapakesan,  1979,  p.  206)  Gupta  and 
Panchapakesan  (1972)  obtained  a  sufficient  condition  for  the  PCS  to  be 
monotonically  increasing  (or  decreasing)  in  0.  When  0  is  a  location  or  a 
scale  parameter,  the  PCS  is  independent  of  @.  Gupta  and  Panchapakesan  also 
obtained  a  sufficient  condition  for  the  supremum  of  the  expected  subset  size 
to  take  place  when  the  parameters  are  equal.  This  latter  sufficient  condition 
implies  the  one  for  the  monotonicity  of  the  PCS  in  0.  Besides  the  cases  of 
location  and  scale  parameters  earlier  discussed  by  Gupta  (1965),  the  general 
results  have  been  applied  to  the  case  where  the  density  f (x)  is  a  convex 

co 

mixture  of  the  form  w(e,j )g  ■  (x) .  Here  g.(x),  j  =  0,1,...,  is  a  sequence 

j=0  J  J 

of  density  functions  and  the  w(e,j)  are  nonnegative  weights  such  that 

oo 

l  w(o,j)  =  1.  The  results  for  the  convex  mixture  directly  apply  to  the 
j=0 

procedures  for  selection  from  multivariate  normal  populations  by  Gupta  and 
Panchapakesan  (1969)  in  terms  the  multiple  correlation  coefficient  of  one 
component  with  respect  to  the  others,  and  by  Gupta  and  Studden  (1970)  in 
terms  of  the  Mahalanobis  distance  function.  It  should  also  be  noted  that 
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the  class  of  functions  h  includes  the  usual  choices  made  earlier,  namely, 
h(x)  =  cx,  c  £  1,  and  h(x)  =  x+d,  d  >_  0.  The  class  also  includes 
h(x)  =  cx+d,  c  £  1,  d  >  0,  which  was  used  by  some  authors  later. 


4.2  Generalizations  and  Modifications 

Deverman  and  Gupta  (1969)  considered  a  generalization  of  the  basic  subset 
selection  goal.  Let  e  |--|  -j  <_...<_  be  the  ordered  parameters  of  k  populations. 
The  populations  associated  with  t  largest  0. 's  are  the  t  best  populations. 

Any  subset  of  a  fixed  size  s  is  called  an  s-subset.  The  goal  is  to  select  a 
subcollection  of  the  collection  of  all  the  (  )  s-subsets  with  a  minimum 
guaranteed  probability  P*  that  the  chosen  subcollection  contains  at  least  one 
s-subset  having  at  least  c  of  the  t  best  populations.  Obviously,  for  a  meaningful 


problem,  the  integers  c,  s,  t,  and  k  must  be  such  that  k  £  2  and 

min(s.t)  t  k  .  k 

max(l ,  s+t+l-k)  £  c  £  min(s ,t) .  Also,  P*  £  l  (i)($_])/(s).  When 


i=c 


s  =  t  =  c  =  1 ,  we  get  the  basic  problem  of  selecting  a  subset  to  contain  the 
best. 


In  the  basic  formulation  we  select  a  nonempty  subset  of  the  k  given 
populations.  When  the  parameters  e.  are  all  very  close  to  one  another,  we 
are  likely  to  select  all  the  populations.  So  it  is  meaningful  to  put  a 
restriction  that  the  size  of  the  selected  subset  will  not  exceed  m  (1  <  m  <  k). 
Even  otherwise,  one  may  want  to  select  a  nonempty  subset  of  a  random  size 
subject  to  a  maximum  of  m.  Such  a  formulation  is  called  a  restricted  subset 
size  formulation.  The  general  theory  was  developed  by  Santner  (1973,  1975) 
and  the  normal  means  selection  problem  was  investigated  by  Gupta  and  Santner 
(1973).  An  important  feature  of  this  formulation  is  that  an  indifference 
zone  is  introduced.  The  minimum  guaranteed  PCS  is  required  when  the  parametric 
vector  o  =  (o^,...,o^)  belongs  to  the  preference  zone.  The  minimum  sample  size 
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n  and  the  constant  associated  with  the  selection  rule  are  to  be  determined. 
The  general  theory  of  Santner  (1975)  formally  reduces  to  give  the  results 
of  Bechhofer  (1954)  for  m  =  1  and,  those  of  Gupta  (1956,  1965)  for  m  =  k  if 
the  indifference  zone  is  allowed  to  vanish. 

To  illustrate  the  restricted  subset  selection  problem, consider  k  normal 
populations  with  unknown  means  and  a  common  known  variance  a  . 

We  want  to  select  a  subset  of  size  not  exceeding  m  (1  <  m  <  k)  such  that 
the  best  population  (the  one  associated  with  vj-^-j)  is  selected  with  a 
probability  at  least  equal  to  P*  whenever  1  5  where  6  >  0  is 

specified  in  advance.  The  rule  of  Gupta  and  Santner  (1973)  is 


R„:  Select  ir.  if  and  only  if 

O 


Vi'l1  01 1 

where  and  X^-j  <.  ..<  are  the  unordered  and  the  ordered  sample 

means  based  on  samples  of  size  n.  For  a  specified  value  of  6,  d  will  depend 
on  k,  P*,  and  n. 

Another  modification  is  to  relax  the  goal  of  selecting  the  best  popula¬ 
tion.  If  Q,,...,ek  are  the  ranking  parameters,  one  may  be  content  with 
selecting  populations  that  are  nearly  as  good  as  the  best  (the  one  associated 


with  0[k])*  Lehmann  (1963a)used  this  idea  though  not  for  a  subset  selection 
goal.  Priority  in  introducing  this  concept  goes  to  Fabian  (1962)  who  defined 
a  A-correct  ranking  for  the  problem  of  Bechhofer  (1954).  Let  us  consider  the 
case  of  location  parameters.  Lehmann  (1963a)defined  a  good  population  as 
any  population  for  which  0^  ^  Sr^-j  -  A  >  0.  Desu  (1970)  defined 
superior  and  inferior  populations  by  0^  >_  Or^-j  A-j  and  £ 


3[k]  "  A2 ’ 


respectively,  where  0  <  A-j  <  A£.  His  goal  is  to  select  a  nonempty  subset 
of  the  k  given  populations  that  excludes  all  inferior  populations  with  a 
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minimum  guaranteed  probability  P*.  The  performance  of  a  procedure 
satisfying  the  P*-requi rement  can  be  evaluated,  for  example,  by  the 
expected  number  of  superior  populations  included  in  the  selected  subset. 
Carroll,  Gupta  and  Huang  (1975)  considered  eliminating  inferior  populations 
with  respect  to  the  t  best,  i.e.,  those  ir.'s  for  which  e..  £  _  A> 

A  >  0.  They  called  these  populations  strictly  non-t-best.  These  definitions 
are  modified  in  an  obvious  way  to  handle  scale  parameters.  Panchapakesan 
and  Santner  (1977)  introduced  a  generalization  by  defining  a  good  population 
relative  to  the  t-th  best  as  one  for  which  e.  £  P(9[k-t+l])  where  P  a 
function  possessing  certain  general  properties.  They  considered  two 
goals:  (i)  selecting  a  nonempty  subset  containing  only  good  ones,  and  (ii) 
selecting  a  subset  whose  size  does  not  exceed  m  (1  <  m  <  k)  and  which  will 
include  at  least  one  good  population.  Their  treatment  complements  the 
unified  results  of  Gupta  and  Panchapakesan  (1972)  and  Santner  (1975). 

4.3  Decision-theoretic  formul ation;  Bayes  and  empiri cal  Bayes  Procedures 

During  this  period  of  main  growth,  the  early  contributions  to  the 
decision-theoretic  formulation  was  made.  Some  Bayes  and  empirical  Bayes 
procedures  were  derived.  It  may  be  felt  that  these  early  contributions 
were  modest  compared  to  the  growth  of  the  literature  on  the  classical 
procedures  during  this  period.  However,  they  gave  the  impetus  to  the 
developments  that  would  follow  in  the  subsequent  periods. 

Now,  to  describe  the  decision-theoretic  setup,  let  it.  (i  =  l,...,k) 

be  described  by  the  probability  space  (%,  G,  P^K  where  P^  belongs  to  some 

family  p  of  probability  measures.  Let  us  assume  that  the  family  P  is 

stochastically  ordered;  in  other  words,  there  is  a  stochastic  ordering 

between  any  pair  (P.,P.)  from  p.  The  stochastically  largest  among 

*1  J 
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n  , . . .  ,-rr^  is  the  best  population.  In  the  case  of  more  than  one  contender, 
we  assume  that  one  of  them  is  tagged  as  the  best.  We  observe  =  (X^,...,X^) 
where  Xi  is  an  observation  from  rr..,  i  =  l,...,k.  The  space  of  the  observation 
£  is  %k  =  {£  =  (x-|,. . .  ,xk) :  xi  a,  i  =  1 , . . .  ,k}.  The  decision  space  j9  consists 
of  the  2k  subsets  d  of  the  set  {1 ,2, . . . ,k};  in  other  words,  &  =  {d|dc{l,...,k}}. 
Thus  a  decision  d  corresponds  to  the  selection  of  a  subset  (possibly  the  empty 
set)  of  the  k  given  populations.  Any  decision  d  €  &  is  a  correct  selection  if 
j  g  d  where  n ■  is  the  best  population.  A  selection  procedure  is  a  measurable 

i] 

k  k 

function  6  defined  on  z  x  &  such  that  for  each  x  6  %  ,  we  have  <5(x,d)  >  0  for 

any  d  €  S  and  £  <s(x,d)  =  1.  Here  6 ( x, d)  is  the  probability  that  the  subset  d 
d€&  ^ 

is  selected  when  x  is  observed.  The  individual  selection  probability  p^x) 

for  the  population  u.  is  given  by  p.(x)  =  l  6(x,d),  the  summation  being  over 

1  1  d3i 

all  subsets  that  contain  i.  While,  in  general,  the  individual  selection 

probabilities  do  not  uniquely  determine  the  selection  procedure  6(x,d),  they 

do  so  when  the  p.(x)  take  on  only  values  0  and  1  (see  Gupta  and  Panchapakesan, 

1 

1979,  p.  212). 

Studden  (1967)  studied  optimum  selection  rules  assuming  that 

9  =  (e,,...,e.)  is  a  permutation  of  a  k-vector  of  known  elements.  He 

'X'  i  K 

assumed  a  loss  function  L(e,d)  =  £  L.(e)  +  L(l-I),  where  L.(j^)  is  the  loss 

i€d 

whenever  tt.  is  selected  and  I  =  1  or  0  according  as  a  correct  selection  is 
or  is  not  made.  This  loss  function  is  also  assumed  to  be  permutation- 
invariant.  Studden  (1967)  obtained  the  best  (in  the  sense  of  minimizing  the 
risk)  invariant  selection  rule. 

Nagel  (1970)  defined  a  concept  of  just  selection  rules.  Suppose  that 
>  defines  a  partial  order  in  X.  We  say  y  is  preferable  to  x  if  y  >  x.  A 
selection  rule  R,  defined  by  its  individual  selection  probabilities 
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(x) ,  i  =  1 .  ,k,  is  called  if  and  only  if 


Let  ft  denote  the  space  of  the  parameter  vector  ei  and  ftg  denote  the  part  of 
ft  in  which  all  the  parameters  are  equal.  Nagel  (1970)  showed  that,  under 
appropriate  ordering  on  the  parameter  space,  for  any  just  rule  R 


inf  P ( CS | R )  =  inf  P(CS|R), 


(12) 


which  is  a  reasonable  property  to  impose  on  a  rule.  Nagel  also  showed  that 
a  permutation-invariant  just  rule  is  montone. 

Deely  and  Gupta  (1968)  obtained  Bayes  procedures  considering  linear 
loss  functions  of  the  type 


(13) 


where  S  denotes  the  set  of  indices  of  the  selected  populations.  Deely  (1965) 
investigated  empirical  Bayes  procedures  and  derived  these  procedures  in  several 
special  cases. 

4.4  Selection  Procedures  for  Multivariate  Normal  and  Multinomial  Distributions 

Several  problems  were  investigated  relating  to  the  best  component  of  the 
mean  vector  of  a  single  multivariate  normal  population  and  the  best  of  several 
multivariate  normal  populations.  For  ranking  several  multivariate  normal 
populations  several  criteria  were  used  such  as  the  Mahalanobis  distance 
function  (Alam  and  Rizvi,  1966;  Gupta,  1966;  Gupta  and  Studden,  1970),  generalized 
variance  (Gnanadesi kan  and  Gupta,  1970),  and  multiple  correlation  coefficient 
between  a  particular  component  and  the  remaining  ones  (Gupta  and  Panchapakesan, 
1969).  However,  in  some  of  these  problems  the  exact  infimum  of  the  PCS  was 
not  established  in  general.  For  selecting  the  best  component  of  a  single 
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multivariate  normal  population,  Gnanadesikan  (1966)  considered  a  location 
type  procedure  based  on  sample  component  means.  Except  in  the  case  of  bivariate 
normal,  only  a  lower  bound  of  the  PCS  is  used  to  obtain  a  conservative  value  of 
the  constant  to  be  used  in  the  procedure  even  in  the  case  of  known  correlation 
matrix  E.  The  difficulty  is  due  to  the  fact  that  the  association  between  the 
ranked  components  and  the  known  correlations  is  unknown.  If  we  assume  that 
the  components  have  the  same  variance  and  are  equally  correlated  with  correlation 
p  >  0,  then  the  exact  solution  is  available  (Gupta,  Nagel  and  Panchapakesan, 

1973).  For  selecting  the  best  of  several  p-variate  normal  distributions, 

N(p .  ,E . ) ,  i  =  l,...,k,  in  terms  of  the  Mahalanobis  distance  function,  Gupta 

0/1  1 

(1966)  and  Gupta  and  Studden  (1970)  proposed  procedures  when  the  covariance 
matrices  are  known  as  well  as  when  they  are  unknown.  The  case  of  common 
e  (Ei  =  e  for  all  i)  was  not  solved.  This  was  later  solved  in  an  approximate  sense 
by  Chattopadhyay  (1981).  A  few  other  measures  were  considered  (Frischtak,  1973; 
Gnanadesikan,  1966)  for  ranking  multivariate  normal  populations  but  the  results 
in  these  cases  are  very  limited  in  scope  or  are  asymptotic  in  nature.  For 
selecting  the  populations  better  than  a  standard,  Krishnaiah  and  Rizvi  (1966) 
considered,  as  criteria,  linear  combinations  of  the  elements  of  mean  vectors 
and  distance  functions  whereas  Krishnaiah  (1967)  considered  linear  combinations 
of  the  elements  of  the  covariance  matrices. 

For  selecting  the  most  (or  the  least)  probable  cell  in  a  multinomial 
distribution,  Gupta  and  Nagel  (1967)  proposed  a  single  stage  procedure.  Let 
X, ,...,X.  denote  the  cell  counts  based  on  n  independent  observations  from 

l  K 

a  k-cell  multinomial  distribution  with  unknown  cell  probabilities  p-j ». . .  sP^- 
Gupta  and  Nagel  (1967)  proposed  and  investigated  the  following  rules  Rg  and 
R]0  for  selecting  the  cell  with  the  largest  and  the  smallest  pi ,  respectively. 


18 


Rg:  Select  it.,  if  and  only  if 

X.  >_  max  X-  -  D  (14) 

1  l<j<k  J 

where  D  =  D(k,n,P*)  is  the  smallest  nonnegative  integer  for  which  the 
P*-condition  is  satisfied. 

R-jq:  Select  it.,  if  and  only  if 

X.  <  min  X.  +  C  (15) 

1  Ifjlk  J 

where  C  =  C(k,n,P*)  is  the  smallest  nonnegative  integer  for  which  the 
P*-condition  is  satisfied. 

The  first  interesting  point  to  emerge  about  Rg  and  R^q  is  that  unlike 

in  the  cases  of  earlier  problems  such  as  normal  means,  normal  variances,  etc., 

the  analysis  in  the  minimum  case  does  not  exactly  parallel  that  in  the  maximum 

case.  Also,  for  k  >  2,  the  LFC  was  not  completely  determined.  Gupta  and  Nagel 

(1967)  showed  that  the  LFC  (in  terms  of  the  ordered  p^)  is  of  the  type 

(0, . . . ,0,s ,p, . . . ,p) ,  s  £  p,  in  the  case  of  Rg  and  is  of  the  type  (p,...,p,q), 

p  £  q,  in  the  case  of  R^Q.  An  alternative  to  Rg  is  the  inverse  sampling 

selection  rule  of  Panchapakesan  (1971,  1973)  for  which  the  infimum  of  the  PCS 

occurs  when  all  the  cell  probabilities  are  equal. 

Multinomial  selection  rules  are  also  important  in  the  sense  that  they 

provide  distribution-free  procedures.  Let  iTp...,Tr.  have  continuous  distributions 

F  ,  i  =  l,...,k,  which  belong  to  a  stochastically  increasing  family  in  o. 
ei 

Let  p.j  denote  the  probability  that  in  a  set  of  k  observations,  one  from  each 
distribution,  the  observation  from  tt^.  is  the  largest,  i  =  !,..., k.  Selecting 
the  stochastically  largest  population  is  equivalent  to  selecting  the  population 
associated  with  the  largest  p..  By  taking  observations,  vector  at  a  time. 
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and  noting  which  population  yielded  the  largest  observation,  we  can  convert 
the  problem,  in  an  obvious  manner,  to  that  of  selecting  the  most  probable 
multinomial  cell. 

4 . 5  Conditional  Selection  Procedures 

In  Section  3,  we  saw  that,  for  the  Gupta-Sobel  rule  [Ry  defined  by 
(9)],  the  infimum  of  the  PCS  occurs  when  all  the  success  probabilities 
associated  with  the  k  binomial  populations  are  equal  to  p,  but  this 
common  value  p  at  which  the  infimum  takes  place  is  not  known  when  k  >  2. 

Thus  there  was  no  result  earlier  giving  a  reasonable  conservative  value  of 
the  associated  constant  for  any  given  n.  Similar  difficulties  arise  also 
with  procedures  for  Poisson  populations.  There  are  also  a  few  other 
interesting  points  about  the  usual  procedures  in  this  case.  Let  us  briefly 
mention  them  here. 

Let  X-|,...,Xk  denote  the  numbers  of  occurrences  from  k  Poisson 
populations  with  parameters  A,,..., A.,  respectively.  Suppose  we  want  to 
select  the  population  with  the  largest  a,.  Here  the  usual  location  and 
scale  type  procedures  cannot  be  found  to  satify  the  P*-condition  for  all 
permissible  values  of  P*.  Gupta  and  Huang  (1975a)  proposed  a  modified  proce¬ 
dure  R-| i  which  selects  n^.  if  and  only  if 

X.+l  c  max  X.  (16) 

1  l<j£k  3 

where  0  <  c  =  c(k,P*)  <  1  is  to  be  chosen  subject  to  the  P*-requi rement. 

The  motivation  behind  this  procedure  comes  from  a  result  of  Chapman  (1952) 
which  says  that  there  is  no  unbiased  estimator  of  x-j/ A2  but  ^/(X^l) 
is  "almost  unbiased."  Gupta  and  Huang  (1975a)  have  shown  that  the  infimum 
of  the  PCS  occurs  when  A,  =...=  Ak  =  A;  however,  the  common  value  a  at  which 
the  infimum  occurs  is  not  established. 
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Since  the  common  value  of  the  parameters  at  which  the  infimum  of  the  PCS 
occurs  is  not  known  for  these  rules  for  the  binomial  and  Poisson  populations, 
the  natural  question  is:  Can  we  find  conservative  values  for  the  constants 
defining  the  procedures?  An  answer  in  the  affirmative  follows  from  the  use 
of  conditional  selection  rules  which  form  a  part  of  the  important  contributions 
of  the  period  under  review. 

Gupta  and  Nagel  (1971)  first  proposed  conditional  subset  selection  rules 
in  the  case  of  binomial,  Poisson,  and  negative  binomial  populations.  Their 
rules  are  randomized  just  rules.  So  they  satisfy  (12).  For  selecting  the 
binomial  population  with  the  largest  success  probability  6^,  their  rule  R-|  ^ 
is  given  by  the  individual  selection  probabilities 


/  1 

if  X,  >  ct 

Pi(x)  =  P 

if  xi  =  ct,  i  =  1 , . . .  ,k, 

(17) 

\  0 

if  Xi  <  ct 

where  t  =  x-j  +...+  and  the  constants  p 

and  c^  are  determined  to 

satisfy 

E ( p  k ( X ) |T  =  t)  =  P* 

(18) 

where  T  =  X-j  +...+  X^.  The  important  fact  to  note  about  R-^  and  the  similar 
Gupta-Nagel  randomized  procedures  for  Poisson  and  negative  binomial  populations 
is  that  the  infimum  of  the  PCS  takes  place  when  the  parameters  under  consideration 
of  the  k  populations  are  equal  and  the  constant  associated  with  the  rule 
(depending  on  the  value  of  the  statistic  T  on  which  the  conditioning  is  done) 
is  independent  of  the  common  value  of  the  parameters. 

For  the  binomial  selection  problem  Gupta,  Huang  and  Huang  (1976)  proposed 
a  nonrandomi zed  conditional  rule 


R-|,:  Select  it.  if  and  only  if 
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k 

X.  >  max  X.  -  D(t)  given  T  =  \  X .  =  t ,  (19) 

1  ~  l<j<k  J  1=1  1 

where  D(t)  >  0  is  to  be  chosen  to  satisfy  the  P*-condition.  Exact  result  for 
the  infimum  of  the  PCS  is  obtained  for  only  k  =  2;  in  this  case,  the  infimum 
is  attained  when  p-j  =  p2  =  p  and  is  independent  of  the  common  value  p.  For 
k  >  2,  Gupta,  Huang  and  Huang  (1976)  obtained  a  conservative  value  for 
D(t) .  They  also  obtained  a  conservative  value  for  the  constant  d  of  the 
unconditional  rule  Ry  in  Section  3.  It  should  be  noted  that,  in  using  the 
conditioning  argument  to  obtain  a  conservative  value  of  d,  one  can  always 
guarantee  the  P*-condition.  The  values  of  the  constant  d  tabulated  by  Gupta 
and  Sobel  (1960)  for  k  >  3  are  based  on  normal  approximation  and  thus  may 
lead  to  a  drop  of  the  PCS  below  P*. 

Conditional  rules  for  Poisson  populations  were  given  by  Gupta  and  Huang 

(1975a).  These  are  similar  to  R, ,  given  by  (16)  with  c(t)  in  the  place  of  c, 
k 

given  that  T  e  £  X.  =  t.  It  is  well  known  that,  if  X-,,...,Xk  are  independent 

Poisson  variables  with  parameters  X-,,...,Ak»  respectively,  then  the  conditional 

joint  distribution  of  X-,,...,Xk  given  X-,  +...+  Xk  =  N  is  multinomial  with 

cell-probabilities  pi  =  x^/zx^.  So  the  conditional  selection  rule  for  Poisson 

can  be  exploited  to  provide  a  selection  rule  for  selecting  the  most  probable 

multinomial  cell  which  selects  the  cell  n..  if  and  only  if 

X .  +1  >_  c  max  X.,  where  c  =  c(k,N,P*)  6  (0,1).  Gupta  and  Huang  (1975a)  did 
1  "  IfJlk  J 

propose  this  rule.  A  conservative  value  of  c  can  be  obtained  from  their 
results  for  the  conditional  selection  rule  for  Poisson  populations. 

4 . 6  Nonp arametric  Procedures 

The  first  nonparametri c  subset  selection  procedure  was  studied  by 
Rizvi  and  Sobel  (1967)  for  the  problem  of  selecting  the  population  having 
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the  largest  a-quantile  (0  <  a  <  1)  from  k  populations  having  continuous 

distributions.  Assume  that  the  size  n  of  the  sample  from  each  population 

is  sufficiently  large  so  that  1  <_  (n+l)a  £  n  and  define  a  positive  integer  r 

by  the  inequalities  r  £  (n+l)a  <  r+1 .  This  implies  that  1  £  r  £  n.  Let 

Y.  .  denote  the  jth  order  statistic  from  u. ,  j  =  l,...,n;  i  =  l,...,k.  The 
J  9  *  * 

procedure  proposed  by  Rizvi  and  Sobel  (1967)  is  interesting  in  the  sense  that 
it  differs  from  the  usual  maximum  type.  Their  rule  is 


R^:  Select  if  and  only  if 


Y  .  >  max  Y 


^  -  l£j<k  r"C’J 


(20) 


where  c  is  the  smallest  integer  with  1  £  c  £  r-1  for  which  the  P*-condition 
is  satisfied.  However,  a  c-value  satisfying  the  P*-condition  exists  only 
if  a  permissible  value  of  P*  does  not  exceed  a  value  P-j  depending  on  n,  a, 
and  k.  The  procedure  is  monotone  and  the  expected  subset  size  is  maximized 
when  all  the  distributions  are  identical. 

Gupta  and  McDonald  (1970)  assumed  that  the  distributions  ,  i  =  l,...,k, 
belong  to  a  location  or  a  scale  parameter  family.  For  selecting  the  population 
associated  with  the  largest  parameter,  they  proposed  procedures  based  on 
rank-sum  or  rank-score  statistics  associated  with  the  pooled  sample  obtained 
from  samples  of  size  n  from  each  population.  Of  the  three  procedures  they 
proposed,  two  are  the  usual  maximum  type  procedures,  one  for  the  location  case 
and  the  other  for  the  scale  case.  The  best  that  can  be  said  about  the  LFC 
for  these  procedures  is  that  it  occurs  when  =  0[|<]'  9enera^  >  the 

LFC  for  these  procedures  is  not  an  equi-parameter  one  unless  k  =  2.  It  was 
inadvertently  claimed  so  by  some  authors  earlier.  The  same  difficulty  arises 
in  the  indifference  zone  formulation.  Formal  counterexamples  were  given  by 
Rizvi  and  Woodworth  (1970).  Gupta  and  McDonald  (1970)  gave  bounds  for  the 
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infimum  of  the  PCS.  The  third  procedure  of  Gupta  and  McDonald  (1970)  is 

R  •  Select  n.  if  and  only  if 
I  b  1 

H.>d  (21) 

where  the  are  the  appropriate  statistics.  For  this  rule,  the  infimum 
of  the  PCS  is  attained  when  e]  =...=  ek;  however,  this  rule  may  select  an 
empty  subset  unless  P*  is  sufficiently  large.  A  few  other  related  papers 
are  McDonald  (1972,  1974),  Blumenthal  and  Patterson  (1969),  and  Puri  and 
Puri  (1968,  1969). 

If  we  have  distributions  from  a  location  parameter  family,  we  can 
use  procedures  based  on  one-sample  Hodges-Lehmann  estimators.  For  these 
procedures,  the  LFC  can  be  determined.  Ghosh  (1973)  studied  such  procedures 
with  goals  involving  selection  of  a  fixed  number  of  populations.  Gupta  and 
Huang  (1974)  studied  such  a  procedure  for  the  goal  of  eliminating  populations 
which  are  strictly  inferior  to  the  t  best. 

A  review  of  the  procedures  described  above  and  a  few  other  related  results 
are  given  by  Gupta  and  McDonald  (1982). 

4 . 7  Selection  from  Restricted  Families 

A  restricted  family  of  probability  distributions  is  defined  by  a  partial 
order  relation  with  respect  to  a  known  distribution.  Such  families  provide 
characterizations  of  life-length  distributions  and  thus  are  very  important 
in  reliability  studies.  Selection  rules  for  such  restricted  families  were 
first  considered  by  Barlow  and  Gupta  (1969).  In  order  to  make  our  discussion 
of  the  selection  procedures  for  these  families  adequately  self-contained,  we 
will  define  the  partial  orderings  that  have  been  used.  For  more  details  and 
related  references,  the  reader  is  referred  to  Gupta  and  Panchapakesan  (1979, 


Chapter  16) . 
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We  define  the  following  partial  order  (<)  relations  for  two  distributions 
F  and  G  assumed  to  be  absolutely  continuous. 

Definitions  4. 1 .  (1)  F  is  said  to  be  convex  with  respect  to  G  if 

and  only  if  G~V(x)  is  convex  on  the  support  of  F. 

(2)  F  is  said  to  be  star-shaped  with  respect  to  G  (F  <  G)  if  and  only 

if  F(0)  =  G(0)  =  0,  G~V(x)/x  is  increasing  in  x  >  0  on  the  support  of  F. 

(3)  F  is  said  to  be  r-ordered  with  respect  to  G  (F  <  G)  if  and  only  if 

-  r 

F(0)  =  G(0)  =  j  and  G~V(x)/x  is  increasing  (decreasing)  in  x  positive 
(negati ve) . 

(4)  F  is  said  to  be  tai 1-ordered  with  respect  to  G  (F  •<  G)  if  and  only 

if  F(0)  =  G(0)  =  j  and  G~V(x)-x  is  increasing  on  the  support  of  F. 

It  is  known  that  convex  ordering  implies  star-ordering.  Further,  when 

G(x)  =  l-e"x(x  >  0),  F  <  G  is  equivalent  to  saying  that  F  has  an  increasing 
—  c 

failure  rate  (IFR)  and  F  ■<.  G  is  equivalent  to  saying  that  F  has  an  increasing 
failure  rate  on  the  average  (IFRA).  Of  course,  if  F  is  IFR,  then  it  is  also 
I  FRA. 

Let  . . -rr^  have  the  associated  absolutely  continuous  distributions 

F-| , . . . ,  F^,  respectively.  The  best  population  is  defined  in  terms  of  a 
characteristic  such  as  the  mean  or  quantile  of  a  given  order.  Let  Fj-^-j  denote 
the  distribution  function  of  the  best  population.  We  assume  that  Fj-^j  is 
stochastically  larger  than  the  rest  and  that  F^  •<  G,  i  =  l,...,k. 

Under  the  above  setup,  Barlow  and  Gupta  (1969)  proposed  a  procedure  for 
selecting  the  population  having  the  largest  a-quantile  (0  <  a  <  1)  when  all 
the  F^  are  star-shaped  with  respect  to  a  known  G.  Let  T. ^  denote  the  jth 
order  statistic  based  on  n  independent  observations  from  n.  ,  i  =  l,...,k, 
where  j  £  (n+l)a  <  j+1.  The  procedure  of  Barlow  and  Gupta  (1969)  is 
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R, :  Select  v  .  if  and  only  if 

16  l 

T.  .  _>  c  max  T.  (22) 

J,n  l<r<k  J’ 

where  c  =  c(k,  P*,  n,  j)  is  the  largest  constant  in  (0,1)  for  which  the  P*- 
condition  is  satisfied.  Tables  for  the  constant  c  and  the  constant  for  the 
analogous  procedure  for  selecting  the  population  with  the  smallest  a-quantile 
are  given  by  Barlow,  Gupta  and  Panchapakesan  (1969)  for  the  special  case 
of  exponential  G,i.e.  for  the  IFRA  family  of  distributions.  Another  special 
case  of  G  is  the  folded  normal  obtained  by  folding  N(0,a  )  at  the  origin, 
where  a  is  assumed  to  be  known.  The  class  of  distributions  which  are  star¬ 
shaped  with  respect  to  the  folded  normal  is  a  subclass  of  IFRA  distributions. 
Selection  in  terms  of  quantiles  for  this  restricted  family  was  considered  by 
Gupta  and  Panchapakesan  (1975). 

Barlow  and  Gupta  (1969)  considered  also  the  selection  of  the  population 
with  the  largest  median  from  a  set  of  distributions  that  have  lighter  tails 
than  a  specified  G.  The  definition  of  F^  having  a  lighter  tail  than  G  used 
by  them  implies  that  F.  centered  at  its  median  is  tail-ordered  with  respect 
to  G.  The  procedure  of  Barlow  and  Gupta  (1969)  applies  equally  to  the 
problem  of  selecting  the  population  with  the  largest  median  from  a  set  of 
populations  which,  centered  at  their  respective  medians,  are  tail-ordered 
with  respect  to  G.  This  has  been  discussed  by  Gupta  and  Panchapakesan  (1974) 
who  have  given  the  values  of  the  constant  when  G  is  the  logistic  distribution, 
G(x)  =  [l+e"x]-1 . 

Some  important  unified  results  were  obtained  by  Gupta  and  Panchapakesan 
(1974).  They  defined  a  general  partial  order  relation  called  ii-ordering  through 
a  class  of  functions  U  =  (h }  and  discussed  a  related  selection  problem.  The 
H-ordering  includes  the  star-  and  tail -orderings  as  special  cases.  The 
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selection  rule  is  of  the  type  defined  in  (10)  using  a  member  h  of  ii. 

In  Section  4.1,  we  mentioned  about  the  general  results  of  Gupta  (1966).  He 
dealt  with  three  special  cases  of  his  results.  Two  of  these  are  location 
and  scale  parameter  cases.  His  third  special  case  is  really  the  case  of 
a  restricted  family  using  H-ordering  even  though  the  description  was  not 
in  terms  of  the  partial  ordering. 

4 . 8  Sequential  Procedu res 

Barron  (1968)  and  Barron  and  Gupta  (1972)  investigated  a  noneliminating 

type  sequential  procedure  for  selecting  the  population  with  the  largest 

mean  from  k  normal  populations  with  unknown  means  9^,...,e^  and  common  known 
? 

variance  o  .  However,  it  was  assumed  that  the  successive  differences  of 
the  ordered  e.  are  known.  The  sampling  for  their  procedure  is  done  by  taking 
one  observation  from  each  population  at  each  stage.  At  any  stage,  each 
population  that  has  not  been  so  far  classified  as  accepted  or  rejected,  is 
subject  to  one  of  three  possible  decisions:  accept,  reject,  or  postpone 
classification.  Sampling  continues  until  all  the  populations  are  classified 
either  as  accepted  or  as  rejected.  All  populations  that  are  accepted  constitute 
the  selected  subset.  It  should  be  noted  that  until  all  populations  are 
classified,  the  sampling  is  made  from  all  populations,  previously  classified 
or  not. 

Swanepoel  and  Geertsema  (1973)  considered  a  sequential  procedure  for 
selecting  the  normal  population  with  the  largest  mean  from  k  populations, 
N(01-,o.),  where  all  the  parameters  are  unknown.  They  defined  a  selection 
sequence  using  the  idea  of  a  confidence  sequence  introduced  by  Robbins  (1970). 
For  each  n  1 ,  let  Bn  denote  a  subset  of  the  k  populations  defined  by  n 
observations  from  each  population.  Any  sequence  {B^l  is  a  selection  sequence  if 
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Pr(iT(k)  €  Bn  f°r  all  n  _>  1)  _>  P*  (23) 

for  all  e-|,...,0k.  Let  |S(n)|  denote  the  size  of  the  subset  Bn  and  let  r 
denote  the  number  of  populations  tied  for  the  best  population.  Then,  for 
the  selection  sequence  defined  by  Swanepoel  and  Geertsema,  |S(n)|  ->  r  a.s. 
(almost  surely)  as  n  -*  »,  and  Bn  =  (lr(|<_r+-| )  »•  •  •  a.s.  for  large  n. 

The  size  of  the  selected  subset  can  be  restricted  to  a  maximum  of 
m  (1  £  m  <  k)  by  defining  a  stopping  variable  N  as  the  first  integer  n  >_  1 
such  that  | S ( n )  |  _<  m.  If  r  <  m,  then  N  <  °°  a.s.  and  the  subset  B^  (which 
contains  at  most  m  populations)  includes  the  best  population  with  a  minimum 
probability  P*.  However,  if  r  _>  m+1 ,  then  N  =  °°  with  positive  probability. 

Gupta  and  Huang  (1975b)  discussed  three  sequential  procedures  of  which 
two  are  parametric  and  the  third  is  nonparametri c.  The  nonparametri c  and  one 
of  the  parametric  procedures  are  of  the  nonelimination  type.  The  goal  of 
their  parametric  procedures  is  to  select  what  they  called  mildly  t  best 
populations.  Suppose  that  are  unknown  location  parameters  of  k 

given  populations.  Then  it..  is  called  mildly  t  best  if  e..  21  9[|<_t+l]  "  A’ 
where  A  >  0  is  specified.  For  t  =  1,  -rn  has  been  called  a  superior  population 
by  Desu  (1970)  and  a  good  population  by  Lehmann  (1963).  Gupta  and  Huang 
(1975b)  have  discussed  their  procedures  in  a  general  setup  and  obtained 
special  results  for  selecting  from  normal  populations  in  terms  of  their 
means  and  variances.  Their  nonparametri c  procedure  is  for  selecting  the 
population  with  the  largest  location  parameter  when  the  k  populations  have 
absolutely  continuous  distributions  F(x-e^),  i  =  l,...,k.  It  is  assumed 
that  F(-)  is  symmetric  about  the  origin  and  that  the  densities  are  Polya 
fequency  functions  of  order  2  and  differentiable  almost  everywhere. 

Carroll  (1974)  has  discussed  some  asymptotically  nonparametri c  sequential 


selection  procedures. 
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4.9  Other  Developments 

In  the  early  investigations,  detailed  results  were  obtained  only  for 
procedures  which  used  samples  of  a  common  size  from  the  populations  under 
consideration.  Also,  in  the  case  of  selection  in  terms  of  the  means  from 
k  normal  populations,  the  early  investigations  assumed  that  the  variances 
are  equal.  When  the  variances  are  not  equal  (that  is,  under  heteroscedasti ci ty) , 
the  only  trivial  case  is  when  they  are  all  known  and  the  sample  sizes  are  the 
same.  To  handle  various  other  situations  that  arise,  several  procedures  were 
proposed  and  investigated. 

Let  ir-j , . . .  ,17^  be  k  normal  populations  with  unknown  means  e^,...,0^  and 
(known  or  unknown)  variances  Let  and  denote  the  mean  and 

the  variance  (divisor  n.-l)  of  a  random  sample  of  size  n^  from  it.,  i  =  l,...,k. 

k  k 

Let  s2  =  V  (n .-1 )s?/(N-k) ,  where  N  =  7  n.. 

1=1  1  1  i=l  1 

Let  us  first  consider  the  case  of  known  variances.  In  describing  the 
several  procedures,  we  have  used  the  same  letter  d  to  denote  the  constant 
in  each  case.  This  constant  d  is  the  smallest  positive  constant  for  which 
the  P*-condition  is  satisfied.  Also,  if  a  rather  than  o.  appears,  it  it 
assumed  that  a-j  =...=  a .  =  a.  Gupta  and  W.  T.  Huang  (1974)  proposed  the  rule 

R-,-,:  Select  it.  if  and  only  if  X.  >  max  X.  -  —  . 

17  1  1 -UJ<k  7  /nf 

Later  Gupta  and  D.  Y.  Huang (1976a)  proposed  the  rule 

R10:  Select  u.  if  and  only  if  X.  >  max  (X.-d a}—  +  — ) . 

18  i  1  -  l<j<k  1  /ni  nj 

For  the  case  of  unequal  variances,  Gupta  and  Wong  (1976)  proposed  the  rule 


R 


19  * 


Select  it. 


if  and  only  if  X. 


> 


max  (X.-d 
l<j<k  J 


+ 


Chen,  Dudewicz  and  Lee  (1976)  proposed  a  rule  assuming  o  to  be  unknown.  In 
the  case  of  known  o,  their  rule  would  be 
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1  .  1 


FLn:  Select  it.  if  and  only  if  X-  >_  max  -  da j  —  +  — 

2U  l  '  l<j<k  v  "i  3 

where  a  is  nonnegative  constant  but  usually  chosen  between  n^  and  n^p 

both  inclusive. 

For  all  the  above  procedures  (R^  through  Rgg),  the  respective  authors 

have  obtained  lower  bounds  for  the  infimum  of  the  PCS.  For  k  =  2,  Gupta  and 

D.  Y.  Huang  (1976a)  have  shown  R-jg  to  be  more  efficient  than  R-j7  in  terms  of 

the  supremum  of  the  expected  subset  size.  Berger  and  Gupta  (1980)  considered 

minimax  subset  selection  rules  using  the  criterion  M  =  max  P(ir/.\  is 

l<j<k-l  u; 

selected).  They  have  shown  that  Rjg>and  R19  (when  a- j  =...=  =  a)  are 

minimax  with  respect  to  M  in  the  class  of  nonrandomized,  just,  and  translation 
invariant  rules  which  satisfy  the  P*-condi tion.  The  rules  R-j7  and  RgQ  are  not 
mini  max,  in  general. 

Now,  let  us  consider  the  case  of  unknown  variances.  The  counterparts 
of  the  rules  R^7,  R-jg  and  Rgg  were  proposed  by  the  respective  authors  where 
the  new  rules  R|7,  R^g  and  Rg0  are  of  the  similar  forms  with  s  in  the  place 
of  a;  of  course,  the  constant  d  will  not  have  a  different  value  in  each 
case.  Gupta  and  Wong  (1976)  proposed  a  rule  R|g  which  selects  iri  if  and 
only  if 


X.  _>  max  X.  -  c  max  s.. 

1  l.<j<k  J  l<j<k 

As  in  the  case  of  known  a.  's,  all  the  authors  have  given  only  lower  bounds 
for  the  infimum  of  the  PCS.  Some  comparisons  of  R^7,  R^g  and  Rgg  are  given 
by  Chen,  Dudewicz  and  Lee  (1976). 

When  the  variances  are  unknown  and  unequal,  and  the  sample  sizes  are 
unequal,  Dudewicz  and  Dalai  (1975)  proposed  a  two-stage  procedure  for 
selecting  the  population  with  the  largest  mean  under  the  indifference  zone 
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formulation.  Let  n^  be  the  first  stage  sample  size  for  each  population  and 
n.j  -  n^  is  the  second  stage  sample  size  from  tt^  ,  i  =  l,...,k.  Their  procedure 
is  based  on  the  statistics  ,  where  is  a  weighted  average  based  on  all 
the  n..  observations  from  n.. ,  i  =  l,...,k.  The  weights  are  chosen  subject 
to  certain  conditions.  They  proposed  a  subset  selection  rule 

R91:  Select  it.  if  and  only  if  X.  _>  max  X.  -  d 
1  1  l<j<k  3 

where  d  :>  0.  For  this  procedure,  the  P*-condition  is  satisfied  irrespective 
of  the  choice  of  the  positive  constant  d.  So  one  has  to  impose  some 
additional  restriction  in  order  to  have  a  meaningful  choice  of  d.  One 
possibility  is  to  introduce  a  restriction  on  the  expected  subset  size  in 
some  configuration  of  the  means.  It  is  worth  noting  that  a  two-stage  procedure 
is  not  necessary  in  the  subset  selection  approach  whereas  it  is  necessary 
in  the  indifference  zone  approach. 

5.  YEARS  OF  FURTHER  STRIDES  (1975-1980) 

Although  several  important  contributions  were  made  during  this  period, 
the  foremost  and  the  most  dominant  of  these  were  to  the  development  of 
the  decision-theoretic  approach  to  subset  selection.  Besides  Bayes 
procedures,  several  minimax  and  r-minimax  rules  were  derived.  The  first 
paper  on  locally  optimal  rules  appeared.  Several  contributions  were  made 
with  regard  to  classical  procedures  for  specific  families  of  distributions 
representing  an  outgrowth  of  the  research  in  this  direction  from  the  previous 
period  of  main  growth. 

5 . 1  Bayes  Procedures 

In  Section  4.3,  we  discussed  the  early  developments  of  Bayes  procedures 
using  linear  loss  functions.  The  first  papers  to  come  out  with  nonlinear 
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loss  functions  are  Bickel  and  Yahav  (1977),  Chernoff  and  Yahav  (1977), 

Goel  and  Rubin  (1977),  and  Gupta  and  Hsu  (1978).  They  used  different 

linear  combinations  of  four  components  of  loss,  namely,  (i)  ICS(e,S), 

the  simple  loss  due  to  an  incorrect  selection,  which  takes  values  0  or 

1  according  as  a  correct  selection  is  or  is  not  made,  (ii)  |S|,  the  size 

of  the  selected  subset,  (iii)  er. -  max  e.,  which  expresses  a  measure 

LKJ  j(ES  J 

of  loss  in  using  in  the  future  the  populations  that  are  selected,  and  (iii) 

e  _  £  e./|S|,  which  is  an  'average'  loss  in  using  in  the  future  the  populations 

L  J  j€S  J 

that  are  selected.  The  components  used  in  the  linear 

combination  are:  (i)  and  (iv)  by  Bickel  and  Yahav  (1977),  (iii)  and  (iv) 
by  Chernoff  and  Yahav  (1977),  (ii)  and  (iii)  by  Goel  and  Rubin  (1977), 
and  (i)  and  (ii)  by  Gupta  and  Hsu  (1978). 

Goel  and  Rubin  (1977)  assumed  that  k  distributions  have  densities 
and  belong  to  a  family  with  montone  likelihood  ratio  property.  The 
parametric  vector  o  was  assumed  to  be  symmetric.  They  derived  the  Bayes 
rule  under  this  setup  and  obtained  further  simplifications  in  the  case  of  the 
prior  distribution  of  e  being  a  mixture  of  i.i.d.  random  variables. 

They  also  derived  an  'approximate'  Bayes  rule  R,  which  selects  larger 
subsets  than  the  Bayes  rule  but  is  the  Bayes  rule  for  k  =  2.  This  approximate 
Bayes  rule,  under  a  further  assumption  regarding  the  form  of  the  posterior 
distribution  of  e. ,  reduces  to  the  classical  procedure  of  Gupta  (1956)  except 
that  the  constant  that  is  involved  depends  only  on  the  cost  per  population. 

Goel  and  Rubin  (1977)  also  discussed  the  special  case  of  normal  populations, 

N( 0 • ,a  ) ,  i  =  l,...,k,  where  a  is  known  and  e  has  an  exchangeable  multivariate 
normal  prior  for  all  k. 
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Chernoff  and  Yahav  (1977)  considered  selecting  the  population  with  the 

largest  mean  from  k  normal  populations  with  means  and  a  common 

2 

known  variance  a  ,  where  e  has  an  exchangeable  normal  prior.  They  compared 
their  Bayes  procedure  with  the  (random  size)  subset  selection  procedure  of 
Gupta  (1956)  and  the  fixed  size  subset  selection  procedure  of  Desu  and 
Sobel  (1968).  Their  results  were  based  on  Monte  Carlo  studies  of  the 
integrated  risks  with  respective  to  different  exchangeable  normal  priors. 

Bickel  and  Yahav  (1977)  also  considered  k  normal  populations  with 

2 

means  and  a  common  known  variance  a  .  They  investigated  the  optimal 

solution  when  k  goes  to  infinity  under  the  assumption  that  the  "empirical 
distributions"  of  the  means  ,  i  =  l,...,k,  converge  in  a  suitable  sense 
to  a  smooth  limiting  probability  distribution.  Their  asymptotic  solution  is: 
Select  the  populations  that  generated  the  last  lOOXg  percent  of  the  order 
statistics,  where  Xg  depends  on  the  limiting  distribution  of  the  and  on  the 
penalty  associated  with  a  wrong  decision. 

Gupta  and  Hsu  (1978)  studied  the  comparative  performances  of  the  maximum 
type  procedure  of  Gupta  (1956)  and  the  average  type  procedure  of  Seal  (1955) 
with  their  Bayes  procedure  in  the  case  of  normal  means  with  exchangeable 
normal  priors.  Their  Monte  Carlo  results  indicate  that  the  maximum  type 
procedures  do  almost  as  well  as  the  Bayes  procedures.  Similar  results  have 
been  found  under  different  loss  functions  by  Chernoff  and  Yahav  (1977), 
and  Hsu  (1977).  These  studies  are  useful  because  an  easy-to-implement 
procedure  whose  performance  is  close  to  that  of  Bayes  procedure  is  most 
welcome  from  a  practical  point  of  view;  Bayes  procedures  typically  require 
numerical  integrations  to  implement  them  and  are  difficult  to  compute 
expli citly. 

In  other  developments,  Gupta  and  Hsu  (1977)  using  the  same  loss  function 
as  in  their  1978  paper  established  the  monotonicity  of  Bayes  subset  selection 
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procedures,  under  certain  generalized  monotone  likelihood  ratio  property 
assumption,  for  a  restricted  class  of  priors.  Miescke  (1979)  assumed 
certain  additive  loss  functions  and  showed  that,  in  the  normal  case  with 
symmetric  priors,  the  Gupta  procedure  is  the  limit  of  Bayes  rules  as  the 
sample  size  tends  to  infinity. 

5 . 2  Minimax  and  r-minimax  Rules 

For  the  class  of  subset  selection  rules  for  which  the  PCS  is  at  least 
P*,  Berger  (1979)  investigated  minimaxity  taking  the  loss  as  measured  by 
the  subset  size.  Under  certain  mild  conditions,  he  showed  that  the  minimax 
value  cannot  be  less  than  kP*.  Applying  this  to  location  and  scale  problems, 
he  showed  that,  under  the  monotone  likelihood  ratio  assumption,  the  rules  of 
Gupta  (1965)  are  minimax.  He  also  established  some  necessary  conditions 
for  minimaxity.  One  of  these  conditions  is  related  to  (12)  which  is  an 
important  property  of  just  selection  rules.  It  should  be  noted  that  if  a 
rule  is  minimax  with  respect  to  the  subset  size  |S|,  then  it  is  minimax  also 
with  respect  to  |S'|,  where  S'  consists  of  all  the  nonbest  populations  that 
are  selected. 

Berger  and  Gupta  (1980)  obtained  minimax  rules  in  the  class  of  nonrandomized, 
just,  and  invariant  rules  when  the  risk  is  measured  by  the  maximum  probability 
of  including  a  nonbest  population.  These  rules  are  of  the  form  proposed  and 
studied  by  Gupta  (1965)  in  location  and  scale  parameter  problems.  Using  their 
results,  Berger  and  Gupta  (1980)  examined  the  minimaxity  of  several  rules  for 
the  normal  means  problem  when  the  variances  are  known  but  not  necessarily 
equal  and  the  sample  sizes  are  unequal.  We  have  referred  to  these  results 
in  Section  4.9. 

Bj0rnstad  (1980)  compared  three  minimax  procedures  for  the  normal 

2 

means  problem  where  the  common  known  variance  a  =1.  Let  e,,...,e.  denote 
the  means.  The  three  procedures  are  the  maximum  type  procedure  of  Gupta 
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(1956),  the  average  type  procedure  of  Seal  (1955)  and  a  procedure  of 
Studden  (1967).  The  performances  of  these  three  were  compared  by  using 
the  expected  number  of  bad  populations  (that  is,  those  for  which 
°i  <  °[k]-A’  A  51  0  given)  as  the  criterion,  while  controlling  the  PCS 
when  there  is  only  one  good  population.  The  numerical  comparisons  made 
for  several  slippage  configurations  showed  that  the  average  type  procedure 
is  inferior  to  the  other  two.  While  these  two  other  rules  seem  quite 
comparable,  the  maximum  type  rule  performs  better  when  A  is  small. 

The  use  of  partial  or  incomplete  prior  information  in  statistical 
inference  has  led  to  the  so-called  r-minimax  criterion,  a  term 
initially  used  by  Blum  and  Rosenblatt  (1967).  The  basic  idea  of  the 
criterion,  however,  is  due  to  Robbins  (1951).  Here  we  assume  that  the 
prior  distribution  is  a  member  of  the  subset  r  of  the  class  of  all  priors. 

The  r-minimax  rule  is  obtained  by  minimizing  the  maximum  expected  risk  over 
r.  When  r  consists  of  a  single  prior,  we  get  the  Bayes  rule  for  that 
prior.  On  the  other  hand,  if  r  consists  of  all  priors,  then  we  get  the 
usual  minimax  rule. 

Gupta  and  Huang  (1977)  derived  a  r-minimax  procedure  for  selecting  the 

best  population.  Let  tt-j  , . . .  ,tt^  be  k  populations  with  densities  f  ,  i  =  l,...,k, 

respectively.  Define  x.  =  max  x . . ,  where  x..  is  a  measure  of  separation 

1  1 < j  <k  1J  1J 

jf1' 

between  tt.  and  it..  Let  fl.  =  { e j x -  <  xn},  i  =  l,...,k,  where  xn  is  a  aiven 

1  J  1^1  u  u 

constant.  The  parameter  space  fi  is  partitioned  into  fig  U  U...U  a  , 

where  fig  can  possibly  be  the  empty  set.  The  population  tt.  such  that 

x.  =  min  x.  is  called  the  best  population.  Here  xn  is  appropriately  chosen 
1  l<j<k  J  u 

so  that  fi^  corresponds  to  configurations  where  tt.  is  the  best,  i  l,...,k. 

When  £  6  fig,  selection  of  any  one  of  the  populations  is  considered  a  correct 
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selection.  For  deriving  their  r-minimax  rule,  Gupta  and  Huang  (1977) 

assumed  that  r  =  {p(e)|/  dp(e)  =  qi ,  i  =  l,...,k},  where  p(e)  is  a  prior 

i  k 

distribution  and  q,,...,q.  are  nonnegative  and  7  q.  £  1. 

!k  i=l  1 

Gupta  and  Kim  (1980)  considered  minimax  and  r-minimax  rules  for 
partitioning  k  populations  in  comparison  with  a  standard  or 

a  control  irQ.  Let  ir.  have  density  f.(x-e.),  where  f..  is  symmetric  about 
the  origin  and  is  strongly  unimodal  (that  is,  f.  is  log-concave  on  the  real 
line).  Any  population  it.  is  superior,  equivalent,  or  inferior  to  TrQ 
according  as  0.-0Q  £  A,  or  -A  <  <  A>  or  ei"eo  -  -A’  where  A  >  0  is 

given.  Gupta  and  Kim  (1980)  under  appropriate  losses  for  misclassifications 
of  the  populations  derived  r-minimax  and  minimax  rules  in  the  known  6q  case 
as  well  as  the  unknown  eQ  case.  When  oQ  is  unknown,  attention  was  restricted 
to  the  class  of  rules  for  which  the  decision  about  n.  depends  only  on  the 
observations  from  it.  and  ttq,  i  =  l,...,k. 

5 . 3  Construction  of  Optimal  Selection  Procedures  and  an  Essential 1y 

Comp lete  Class 

Gupta  and  Huang  (1980a)  obtained  a  selection  procedure  under  certain 
optimality  conditions.  Though  their  results  are  obtained  in  a  general 
setup,  we  will  describe  it  in  terms  of  the  normal  means  problem  for  simplicity. 
Let  tt,,...,7T|^  be  normal  populations  with  unknown  means  e^,...,e^  and  a  common 
variance  a  =1.  The  population  associated  with  the  largest  e.  is  the  best 
population.  A  selection  procedure  is  specified  by  the  individual  selection 
probabilities  for  the  populations.  Gupta  and  Huang  (1980a)  derived  an  optimal 
rule  in  the  class  of  rules  for  which  the  PCS  is  at  least  y(0  <  y  <  1 )  by 
minimizing  the  supremum  of  the  expected  subset  size.  In  the  general  setup, 
the  result  requires  a  generalized  version  of  the  monotone  likelihood  ratio 
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for  the  multidimensional  case. 

Gupta  and  Huang  (1980b)  considered  the  class  C  of  rules  for  which 
the  size  of  the  selected  subset  is  controlled  when  the  distributions  are 
identical.  The  goal  is  to  obtain  a  rule  in  this  class  which  maximizes 
the  infimum  of  the  PCS  over  the  parameter  space  ft.  Under  certain  assump¬ 
tions,  Gupta  and  Huang  (1980b)  obtained  an  essentially  complete  class  of 
rules  for  this  problem.  In  this  regard,  a  rule  6-j  is  defined  to  be  as 

good  as  if  inf  P(CS|s,)  >_  inf  P(CS|<5?)  where  both  6,  and  belong  to 

the  class  C.  The  essentially  complete  class  obtained  by  Gupta  and  Huang 
is  the  class  of  monotone  selection  procedures.  If  we  are  having  observations 
x^,...,x^  from  k  populations  with  densities  f(x-e.j ),  i  =  l,...,k,  let 

yij  =  X.-X.,  j  =  1 , . . . , k ;  j  4  i.  Let  y.  =  (y^ . yik) ,  and  let  6.  denote 

the  individual  selection  probability  for  n.. ,  i  =  l,...,k.  Then  the  selection 
rul e  6  =  (6-j , . . .  ,6^)  is  monotone  i f  for  any  i  and  j  ,  for  fixed  y^,  r  =  1 , . . .  ,k 

r  f  i ,  j ,  6.(y.)  is  nondecreasing  in  y... 

1  '"W  »  I  J 

5 . 4  Locally  Optima 1  Subset  Selecti o n  Rules 

Gupta,  Huang  and  Nagel  (1979)  were  the  first  to  investigate  locally 
optimal  subset  selection  rules.  They  were  interested  in  obtaining  such 
rules  based  on  ranks  while  still  assuming  that  the  distributions  associated 
with  the  populations  were  known.  This  is  appealing  especially  in  situations 
in  which  the  order  of  the  observations  are  easily  available  than  the  actual 
measurements  themselves  due  to  excessive  cost  or  other  physical  constraints. 

Let  f(x,0.j)  be  the  density  associated  with  tt.,  0^  £  0  i  =  l,...,k, 
where  0  is  an  interval  containing  the  origin.  Let  a  =  (a,,...,A^)  denote 
the  rank  configuration  of  the  pooled  sample  of  N  =  kn  observations  obtained 
by  taking  a  sample  of  size  n  from  each  population.  Here  A^  =  j  means  that 
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the  ith  smallest  observation  in  the  pooled  sample  came  from  it  ^ .  Let  = 

{e:  6,  =...=  6.}.  The  goal  is  to  derive  a  permutation-invariant  rule  6 

'v  I  K 

based  on  the  rank  configuration  A  such  that 

inf  P  ( CS [ 6 , A )  =  P*  (24) 

subject  to  the  condition: 

maximize  P„(CS|<5,a)  for  all  e  €  Aq  (25) 

where  Afj  denotes  a  neighborhood  of  any  eQ  €  Since  the  distribution  of 

the  ranks  does  not  depend  on  the  underlying  distributions  when  e  €  the 
condition  (24)  implies  that  the  PCS  is  constant  for  6  6  si^.  So  AQ  can  be 
taken  as  a  neighborhood  of  6q  =  (0,...,0).  Gupta,  Huang  and  Nagel  (1979) 
derived  a  locally  optimal  (in  the  sense  of  (25))  rule  under  certain  regularity 
conditions  on  f(x,e).  If  f(x,e)  =  e'^x"e V[l+e"^x"  ^]2,  the  logistic  density, 
their  rule  becomes:  Select  if  and  only  if  R.  >_  c,  where  R..  is  the  rank-sum 

statistic  for  the  sample  from  ,  i  =  l,...,k.  This  is  the  procedure  R^5 
defined  by  (21)  of  Gupta  and  McDonald  (1970). 

Some  other  locally  optimal  subset  selection  rules  with  different  optimality 
criteria  have  been  recently  obtained  by  Huang  and  Panchapakesan  (1982b)  and 
Huang,  Panchapakesan  and  Tseng  (1984).  These  will  be  discussed  in  the  next 
secti on. 

5 . 5  Modified  Goal  for  Subset  Selection,  and  a  Complete  Ranking  Formulation 

In  Section  4.2,  we  discussed  the  restricted  subset  selection  formulation 
of  Santner  (1975)  whose  aim  was  to  restrict  the  size  of  the  selected  subset 
by  an  upper  bound  m  £  k-1.  Huang  and  Panchapakesan  (1976)  studied  a  modified 
formulation  in  which  besides  controlling  the  PCS  an  upper  bound  $  <=  (0,  k-1] 
is  imposed  on  the  supremum  of  the  expected  subset  size.  Whenever  the 
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parametric  vector  e  =  (0p...,e^)  belongs  to  a  preference  zone  defined 
appropriately  for  a  specified  6*.  The  treatment  of  the  problem  by  Huang  and 
Panchapakesan  (1976)  is  in  a  general  setup  that  includes  location  and 
scale  parameter  cases.  As  in  the  restricted  subset  size  formulation  of 
Santner  (1975),  one  has  to  determine  a  constant  associated  with  the  rule  as 
well  as  the  smallest  sample  size  needed  to  meet  the  requi rements .  Specific 
application  to  selection  in  terms  of  the  treatment  effects  in  a  two-way 
layout  has  been  given  in  Huang  and  Panchapakesan  (1976). 

In  another  paper,  Huang  and  Panchapakesan  (1978)  have  considered  a 
subset  selection  formulation  of  the  complete  ranking  problem.  Let 
6, , . . . ,@k  be  the  unknown  parameters  of  k  populations.  The  interest  is  in 
ranking  the  k  populations  according  to  their  0-values.  Any  permutation 
of  the  set  of  integers  {l,2,...,k}  corresponds  to  a  ranking  of  the  populations. 
Huang  and  Panchapakesan  (1978)  considered  the  problem  of  selecting  a  nonempty 
subset  (of  a  random  size)  of  all  the  k!  possible  permutations  such  that  the 
correct  ranking  is  included  in  the  selected  subset  of  permutations  with  a 
minimum  probability  P*  €  ( 1 /k ! ,  1).  They  have  discussed  location  and  scale 
parameter  cases.  If  X-|,...,Xk  are  the  observations  from  it,  , . . .  ,-it^  with 
densities  f(x-e^),  i  =  l,...,k,  the  procedure  of  Huang  and  Panchapakesan 
(1978)  is 

&22 •  Include  the  ranking  ( i  -j  ,i  2, . . .  ,i .)  in  the  selected  subset  if 
and  only  if 

max  max  (X.  -X.  )  £  d,  (26) 

2fS^k  1  <r<s  1 r  1 s 

where  d  is  the  smallest  positive  constant  for  which  the  P*-condition  is 
satisfied.  The  infimum  of  the  PCS  is  attained  when  6^  =...=  e^. 
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5.6  Entropy-based  Selection 

Selection  in  terms  of  an  information  measure  was  first  considered 
by  Gupta  and  Huang  (1976b)  and  Gupta  and  Wong  (1977a).  The  former  paper 
was  concerned  with  binomial  populations  and  the  latter  with  multinomial 
populations.  The  significance  of  these  papers  is  due  not  only  to  the 
importance  of  information-theoretic  measures  in  practice  but  also  to  the 
illustration  of  the  use  of  the  concepts  of  majorization  and  Schur  functions 
in  obtaining  probability  inequalities  in  selection  problems. 

Let  ir-| , . . .  be  k  multinomial  populations  each  with  m  cells  and  let 

the  unknown  cell-probabilities  of  be  P-j ] » •  •  •  i  =  1 . k.  The 

Shannon  entropy  function  associated  with  ir.  is  H.  =  H(p., ,... ,p.. )  = 

1  1  II  IK 

m 

-  I  p.  log  p-.  This  function  is  a  measure  of  the  uncertainty  with  regard 

j  =  l  3  J 

to  the  nature  of  the  outcomes  from  ,  i  =  1,...,  k.  The  populations  are  to 
be  ranked  in  terms  of  the  entropy  function.  For  m  =  2,  the  problem  of 

selecting  the  population  with  the  largest  reduces  to  that  of  selecting 

the  binomial  population  associated  with  the  largest  ip( p^ )  =  -e^  log  e. 

-(1-0- )log(l— ,  where  is  the  success  probability.  Gupta  and  Huang 
(1976b)  studied  a  selection  rule  in  this  case.  Their  rule  is 

Ro0:  Select  it.  if  and  only  if 

23  t  J 

X .  x . 

i|»(p4  >  max  +(A  -  d>  (27) 

l<j<k 

where  X.  is  the  number  of  successes  in  n  trials  associated  with  , 

Xi  Xi  Xi 

\p(-2 -)  =  H.(-p,  1-  ~) ,  and  d  =  d(k,n,P*)  is  the  smallest  nonnegative  constant 
such  that  0  <  d  _<  \p([n/2]/n).  Here  [n/2]  denotes  the  largest  integer  £  n/2. 
For  this  rule,  the  infimum  of  the  PCS  takes  place  when  =...=  =  e. 
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However,  as  in  the  case  of  R-,  of  Section  3,  the  common  value  of  9  for  which 
the  infimum  is  attained  is  not  known.  Gupta  and  Huang  (1976b)  have  obtained 
conservative  value  of  d  as  was  done  by  Gupta,  Huang  and  Huang  (1976)  in  the 
case  of  Ry. 

To  discuss  the  general  case  of  m  >  2,  we  need  the  following  definitions. 

m 

Let  a  =  (a-|,...,am)  and  =  £  a[i]’  w^ere  a[] ]  £•  •  •£  denote  the 

ordered  components. 


Definitions  4.2.  A  vector  a  =  (a-,,..., a  )  is  said  to  majorize  another 
vector  b  =  (b^ , . . . ,bm)  of  same  dimension  (written  a  >  b  or  b  <  a)  if  >  ^r 
for  r  =  2,...,m,  and  A^  =  .  A  function  f  is  said  to  be  Schur-convex 

(Schur-concave)  is  f(x)  _>  f(x')  (f(x)  £  f(x' ))  whenever  x>  x'. 

In  our  selection  problem,  we  assume  that  there  is  a  population  whose 
associated  vector  of  cell-probabilities  is  majorized  by  the  associated 
vector  of  any  other  population.  Such  a  population  will  have  the  largest  H^ 
because  the  entropy  function  is  Schur-concave.  Gupta  and  Wong  (1977a) 
proposed  the  rule 


f^:  Select  ir.  if  and  only  if 


cp( 


vil 


X. 

1ITK 


max  cp 
1  <j<k 


X., 

Jj 


X. 

-f1)  -«• 


(28) 


where  X^i,...,X.  are  the  cell-counts  based  on  n  independent  observations  from 
ir-  (i  =  l,...,k),  cp  is  a  Schur-concave  function,  and  d  =  d(k,m,n,P*)  is  the 
smallest  positive  constant  for  which  the  P*-condition  is  satisfied.  Gupta 
and  Wong  (1977a)  obtained  a  conservative  value  of  d  using  the  idea  of 
conditioning  as  in  the  paper  of  Gupta  and  Huang  (1976b). 
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5 . 7  Other.  Developments 

Here  we  discuss  several  developments  dealing  with  various  aspects  of 
subset  selection  procedures  discussed  in  earlier  sections.  These  relate  to 
selection  procedures  for  Poisson  processes,  selection  from  restricted  families, 
selection  procedures  based  on  medians,  robust  nonparametric  procedures,  selecting 
a  good  subset  of  the  predictor  variables,  and  subset  selection  used  for  screening 
in  a  two-stage  procedure  for  selecting  one  population  as  the  best. 

Selection  procedures  for  Poisson  processes.  Let  ^  be  k  Poisson 

processes  with  unknown  mean  rates  y.j ,. . .  ,11^,  respectively.  Gupta  and  Wong 
(1977b)  investigated  four  different  procedures  for  selecting  the  process  with 
the  largest  v- .  Two  of  these  procedures  are  based  on  the  number  of  occurrences 
X.(tQ),  i  =  l,...,k,  during  time  tQ  from  these  processes  and  are  essentially 
the  rules  defined  by  (16)  and  its  conditional  analogue,  both  discussed 
in  Section  4.5.  A  third  procedure  is 

R  •  Observe  the  processes  continuously  until  max  X.(t)  =  N. 

25  l<i<k  1 

Select  7t .  if  and  only  if 

X . (t)  >  N-c  (29) 

i  — 

where  N  is  a  positive  integer  specified  in  advance,  and  c  =  c(k,P*,N)  is 
the  smallest  nonnegative  integer  for  which  the  P*-condition  is  satisfied. 

The  infimum  of  the  PCS  for  the  rule  R25  takes  place  when  y-|  =...=  y^  =  y  and 
is  independent  of  the  common  value  y.  The  constant  c  is  the  smallest  integer 
(0  <_  c  £  N)  which  satisfies 

J1-GN(t)]k'1dGN_c(t)  >  P*, 


(30) 
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where  G  (t)  is  the  cdf  of  the  gamma  distribution  with  unit  scale  parameter 
and  shape  parameter  r. 

The  fourth  procedure  of  Gupta  and  Wong  (1977b)  is  based  on  observing  the 
processes  at  successive  intervals  of  time,  t  =  1,2,...,  until  the  first  time 
tg  when  X.(t0)  >_  N  for  some  i.  The  selection  procedure  is  based  on  the 
waiting  times  for  N  and  N-c  occurrences,  where  c  is  the  constant  of  the 
rule  R^.  The  details  of  this  procedure  are  omitted  here.  The  infimum  of 
the  PCS  for  this  rule  is  the  same  as  in  the  case  of  R^,  namely,  the  left- 
hand  side  of  (30). 


Selection  from  restricted  families.  Let  be  k  given  populations 

with  distributions  F-j , . . . , ,  respectively.  It  is  assumed  that  there  is  one 
among  them  which  is  stochastically  larger  than  any  of  the  rest.  This  popula¬ 
tion,  denoted  by  Fj-^j,  is  defined  to  be  the  best.  It  is  also  assumed  that 
<  G  and  that  all  our  distributions  are  absolutely  continuous  with  the 
the  positive  real  line  as  the  support.  Let  (Y.  )  denote  the  jth 

order  statistic  in  a  random  sample  of  size  n  from  F. (G).  Define 


1  =  1 . k- 

where  r  (1  <  r  <  n)  is  a  fixed  integer. 


aj  =  gG^t^-gS"1^),  j  =  1 , . .  • ,  r- 1 , 


ar=  gG-](^l), 


(31) 


(32) 


and  g  is  the  density  associated  with  G. 

If  G(y)  =  l-e"y,  y  _>  0,  then  a.  =  1/n  for  j  =  !,..., r-1,  and  a  =  (n-r+l)/n. 

J  r 

Thus  nT.  =  X,(i)  +...+  )([}}  +  (n-r+1  )X^, 1  ^ ,  which  is  the  well-known  total  life 

•  i  r  i  ^ 1 1  r 5n 

statistic  until  the  rth  failure  from  F. . 


43 


Now,  for  selecting  a  subset  containing  F^-j,  Gupta  and  Lu  (  1979)  proposed 
the  rule 

R^g:  Select  if  and  only  if 

T.^c  max  T,,  (33) 

1  <_j  <_k  J 

where  c  is  the  largest  number  in  (0,1)  for  which  the  P*-condition  is  satisfied. 
They  have  shown  that,  if  a .  _>  0  for  j  =  1 . .  ,r,  g(0)  £  1  and  a  >_  c,  then 

J  If 

00 

inf  P(CS|R?6)  =  /  Gy  1 (y/ c)dGT(y) ,  (34) 

fi  0  '  1 

where  Gy  is  the  cdf  of  T,  and  0  is  the  space  of  all  the  k-tuples  (F-j , . . .  ,Fk) 

such  that  there  is  one  among  them  which  is  stochastically  larger  than  the 

others  and  is  convex  with  respect  to  G.  Thus,  the  constant  c  is  chosen  to 

be  the  largest  number  in  (0,1)  such  that  gG"  (i~)  1  c  and  the  right-hand 

side  of  (34)  is  equal  to  P*.  For  the  special  case  of  G(y)  =  l-e~y,  y  _>  0, 

the  condition  a.  >  c  becomes  c  £  (n-r+l)/n.  This  special  case  is  a  slight 

generalization  of  the  results  of  Patel  (1976). 

Flooper  and  Santner  (1979)  considered  selection  of  good  populations  in 

terms  of  a-quantiles  for  star-  and  tail-ordered  distributions  using  the 

restricted  subset  size  approach  discussed  in  Section  4.2.  Let  it.  have  the 

distribution  F^  and  let  Fr. j  denote  the  distribution  having  the  ith  smallest 

a-quantile.  Denoting  the  a-quantile  of  any  distribution  F  by  x  (F),  7ri 

is  called  a  good  population  if  xjF..)  >  c*x2l(F[k_t+1  ]) ,  0  <  c*  <  1,  in  the 

case  of  star-ordered  families,  and  if  x  (F.)  >  x  (Fn  -  d*,  d*  >  0, 

a  1  a  Lk-t+l J 

in  the  case  of  tail-ordered  families.  The  goal  of  Hooper  and  Santner  (1979)  is  to 
select  a  subset  of  size  not  exceeding  m  (1  £  m  <  k-1)  that  contains  at  least 
one  good  population. 
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Selection^ procedures  based  on  medians.  Gupta  and  Leong  (1979),  Gupta 
and  oingh  (1980),  and  Lorenzen  and  McDonald  (1981)  investigated  subset  selection 
procedures  based  on  sample  medians  for  selecting  the  population  with  the 
largest  location  parameter  e .  from  k  given  populations  belonging  to  several 
specific  location  parameter  families.  Let  Ti  be  the  sample  median  based  on 
n  independent  observations  i^,  i  =  l,...,k.  Then  the  procedure  studied  by 
all  the  above  authors  is 

R27:  Select  lr^  if  and  only  if  T.  _>  max  T.-d,  (35) 

l<j<k  J 

where  d  is  the  smallest  nonnegative  number  for  which  the  P*-condition  is  satis¬ 
fied. 

Gupta  and  Leong  (1979)  considered  the  case  of  double  exponential 
populations,  namely,  f(x-e^)  =  exp[-  |  x-0^.  |  ] ,  i  =  l,...,k.  Gupta  and  Singh 
(1980)  considered  normal  populations  with  means  6-|,...,0^,  and  a  common  known 
variance.  They  also  studied  performance  characteristics  of  R in  the  double 
exponential  case.  Lorenzen  and  McDonald  (1981)  discussed  the  case  of  logistic 
distributions  with  means  , . . . , and  a  common  known  variance.  Gupta  and 
Singh  (1980)  made  a  numerical  study  of  the  efficiency  of  R2?  compared  to  the 
Gupta  procedure  based  on  sample  means.  Their  study  indicates  that  the 
procedure  based  on  sample  medians,  though  ordinarily  less  efficient  than 
the  procedure  based  on  sample  means,  does  better  when  the  normal  populations 
are  contaminated.  Lorenzen  and  McDonald  (1981)  compared  R2y  with  the  procedure 
based  on  means,  and  the  rank-sum  procedure  (in  the  case  of  k  =  2)  of  Gupta  and 
McDonald  (1970).  The  general  nature  of  their  findings  are  that  the  median 
procedure  does  better  than  the  means  procedure  when  there  is  contamination  and 
it  does  better  than  the  rank  procedure  when  the  e..  are  believed  to  be  roughly 
in  an  equally  spaced  configuration. 


Robust  nonparametri c  procedures.  In  Section  4.6,  we  discussed  the 
difficulty  in  establishing  the  LFC  for  maximum  type  procedures  based  on 
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ranks.  Hsu  (1980)  proposed  a  rule  based  on  pairwise  (rather  than  joint) 

ranking  of  the  samples  for  which  the  PCS  is  minimized  when  the  distributions 

are  identical.  Let  ir-j , . .  -  ,-it^  have  distributions  F  ,...,F  ,  respectively. 

k 

Let  X  .  i , . . .  ,X-n  denote  the  observations  from  n.. ,  i  =  1 , . . .  ,k.  For  1  <_  i  ,j  <_  k, 

i  j  j,  define  to  be  rank-sum  of  the  sample  from  it.,  in  the  combined 

sample  from  n.  and  it.;  also,  let  d9  1  \  <...<  D,  denote  the  n2  ordered 
i  J  (  )  (rr) 

differences  X.  -X.„,  a,  e  =  l,...,n.  Then  the  rule  of  Hsu  (1980)  is 
ia  J3 


R28: 


Select  ir.j  if  and  only  if 


Ti 


_>  max  T.  and/or  max  R. 

J  J+i  J 


<  r. 


(36) 


where  T. 


P 

£ 


DJ1  , 
med 


di:,/p 


and 


« 

med 


■Ji 

3(k+l) 


(D(k)+D(k+1) 


)/2 


if  n2  =  2k+l 
if  n2  =  2k. 


Here  =  0.  The  constant  r  =  r(n,P*)  is  the  smallest  integer  such  that 

Pn{max  >  r}  <  1-P*,  (37) 

0  j+1  J 

where  Pq  denotes  the  probability  evaluated  when  the  distributions  are  identical. 

It  should  be  pointed  out  that  DjjJ  .  is  the  usual  Hodges-Lehmann  estimator  of 

e.-e.;  the  averaged  estimator  T.  of  0.  was  introduced  by  Lehmann  (1963b)  to 
3 

make  the  estimators  compatible. 

In  another  paper,  Hsu (1981a)  proposed  a  class  of  nonparametric  subset 
selection  procedures  for  unequal  sample  sizes  situations  which  are  based  on 
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two-sample  linear  rank  statistics. 

Selection  of  variables  in  linear  regression.  Applications  of  regression 
analysis  in  practice  for  prediction  purposes  often  involve  a  large  number  of 
independent  variables.  In  such  situations,  a  subset  of  these  predictor 
variables  may  be  sufficient  for  "adequate"  prediction.  In  this  sense,  there 
arises  a  problem  of  selecting  a  "good"  subset  of  these  variables.  For  nice 
reviews  of  several  criteria  and  techniques  that  have  been  used  in  practice, 
reference  may  be  made  to  Hocking  (1976)  and  Thompson  (1978a, b).  The  ad  hoc 
procedures  described  in  the  papers  of  Hocking  and  Thompson,  however,  were  not 
designed  to  select  a  good  subset  with  any  control  on  the  probability  of 
selecting  the  important  variables.  The  first  papers  to  formulate  this  problem 
in  the  framework  of  Gupta's  subset  selection  were  McCabe  and  Arvesen  (1974), 
and  Arvesen  and  McCabe  (1975). 

Consider  the  linear  model 

X  ■  x«  +  S'  <38> 

where  X  is  an  Nxp  known  matrix  of  rank  p  <  N,  6  is  a  pxl  parameter  vector, 

'Xt 

o 

and  6  n,  N(0,  a  I.,),  Iw  being  an  identity  matrix.  The  model  (38)  which  includes 

p  independent  variables  is  considered  to  be  the  "true"  model.  Consider  all 

k  =  (P)  reduced  models  that  are  obtained  by  retaining  t  out  of  the  p  variables. 

The  comparisons  of  these  reduced  models  are  made  under  the  true  model  assumptions. 
2  2 

Let  a-.,..., a.  denote  the  error  variances  of  these  reduced  models.  The  best 

subset  of  the  p  variables  is  defined  to  be  the  set  for  which  the  error  variance 

2 

of  the  corresponding  reduced  model  is  a r -j .  We  will  call  this  model  the  best 
reduced  model  of  size  t.  Arvesen  and  McCabe  (1975)  proposed  a  rule  to  select  a 
nonempty  subset  of  all  reduced  models  of  size  t  so  that  the  best  reduced  model 
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is  included  with  a  minimum  guaranteed  probability  P*.  They  proposed  a 

scale  type  procedure  based  on  the  error  sums  of  squares  associated  with 

these  models.  However,  these  sums  of  squares  are  not  independent  and  an 

exact  evaluation  of  the  infimum  of  the  PCS  is  difficult.  Arvesen  and 

McCabe  showed  that  the  PCS  is  asymptotically  (N  -»  °°)  minimized  when  3^  =...=  Bp  =  0. 

Still  the  evaluation  of  the  constant  associated  with  the  procedure  is  not  simple. 

An  algorithm  for  determining  the  constant  using  Monte  Carlo  methods  is 
given  by  McCabe  and  Arvesen  (1974). 

Subset  selection  for  screening  in  two-stage  procedures.  Suppose  we  have 
k  normal  populations  with  unknown  means  and  a  common  variance  a  . 

The  population  associated  with  the  largest  6^  is  the  best  population.  For 
selecting  a  single  population  as  the  best  under  the  indifference  zone 

p 

formulation  of  Bechhofer  (1954),  a  two-stage  procedure  is  necessary  if  a 

is  unknown.  The  main  purpose  of  the  first  stage  is  to  obtain  an  estimate  of 

2 

a  so  that  the  total  sample  size  necessary  to  satisfy  the  P*-condition  can 

2 

be  determined.  If  a  is  known,  then  the  single  stage  procedure  of  Bechhofer 
(1954)  applies.  However,  in  this  latter  situation,  one  can  use  a  two-stage 
procedure  where  the  first  stage  is  meant  to  effectively  screen  out  inferior 
populations.  Obviously,  this  is  done  by  using  a  subset  selection  procedure 
designed  to  retain  superior  populations.  Early  investigations  of  Cohen  (1959) 
and  Alam  (1970)  were  mostly  confined  to  the  special  case  of  k  =  2.  The 
1977  paper  of  Tamhane  and  Bechhofer  for  k  >_  2  renewed  the  interest  in  such 
procedures.  Their  procedure  is  described  below. 

R^g:  Take  a  first-stage  sample  of  n^  observations  from  each  population. 

Retain  the  populations  it.  for  which  X.  _>  max  X.-ho,  where  the  X.  are  the 

l<j<k  J  1 

sample  means  and  h  >  0  is  to  be  specified  prior  to  the  experimentation. 
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Let  S  denote  the  set  of  populations  thus  retained.  If  S  consists  of  only  one 
population,  stop  sampling  and  select  this  population.  If  S  consists  of  more  than 
one  population,  take  an  additional  sample  of  size  n0  from  each  population  in  S. 
Select  the  population  associated  with  the  largest  Y. ,  where  the  Y.  are  the  means 
based  on  n-j  +  n2  observations  for  those  populations  in  S. 

It  should  be  noted  that  the  fixed-sample  procedure  of  Bechhofer  (1954) 
is  obtained  as  a  special  case  of  R2g  by  letting  h  =  0  or  «.  For  the  rule  R2g, 

PCS  should  be  guaranteed  to  be  at  least  P*  whenever  e  =  (e^ . e^)  belongs 

t°  =  10:  e[k]"6[k-l]  —  (5*^'  For  an^  9iven  (k,  6*,  P*) ,  there  are  an 

infinite  number  of  combinations  of  (n^  n2,  h)  which  will  give  the  above 
guarantee  for  PCS.  Tamhane  and  Bechhofer  (1977)  used  a  minimax  criterion, 
namely,  minimize  sup  E(T)  subject  to  inf  PCS,  where  T  =  kn,  +  |S|n9,  the  total 

n  r\  \  C 


sample  size  required.  However,  the  LFC  is  shown  to  be  e^-j  =...=  ^  =  0[k]-5* 

only  in  the  case  of  k  =  2.  For  k  >  2,  Tamhane  and  Bechhofer  (1977)  obtained 
conservative  solution  by  taking  the  infimum  over  of  a  lower  bound  of  the 
PCS;  in  a  subsequent  paper,  Tamhane  and  Bechhofer  (1979)  provided  an  improved 
yet  conservative  solution  by  using  a  sharper  lower  bound  for  the  PCS.  Their 
numerical  study  shows  that  the  procedure  R2g  is  very  effective  as  a  screening 
procedure,  especially  as  k  increases. 

We  initially  pointed  out  that,  when  a  is  unknown,  the  first  stage  of  a 

O 

two-stage  procedure  is  utilized  for  estimating  a  and  determining  the  total 
sample  size.  If  one  wants  to  further  use  the  idea  of  screening,  it  can  be 
done  by  a  three-stage  procedure  where  the  first  stage  is  used  to  determine 
the  additional  sample  sizes  necessary  in  the  subsequent  stages,  the  second 
stage  is  used  to  eliminate  inferior  populations  by  a  subset  rule,  and  the 
third  stage  (if  necessary)  to  make  the  final  decision.  Such  procedures  have 
been  studied  by  Tamhane  (1976)  and  Hochberg  and  Marcus  (1981). 
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6.  RECENT  DEVELOPMENTS 

Developments  that  took  place  in  the  last  three  or  four  years  constitute 
not  only  continuation  of  research  on  several  topics  that  we  discussed  in  the 
preceding  sections,  but  also  some  new  aspects  and  directions. 

6 . 1  Decision-Theoretic  Developments 

In  this  subsection,  we  will  discuss  minimax,  r-minimax,  Bayes  and 
empirical  Bayes  procedures  for  three  different  goals,  namely,  (i)  selecting 
the  best  population,  (ii)  selecting  good  populations,  and  (iii)  selecting 
good  populations  in  comparison  to  a  control.  However,  results  relating  to 
two-stage  and  sequential  procedures  will  be  discussed  separately  along  with 
some  results  for  classical  two-stage  procedures.  Also,  some  locally  optimal 
selection  procedures  will  be  reviewed  elsewhere  in  this  section. 

Selecting  the  best  populations.  Let  ti,  , . . .  have  densities  f(x,e..), 
where  belongs  to  an  interval  of  the  real  line,  i  =  l,...,k.  It  is 
assumed  that  f(x,e)  has  a  montone  likelihood  ratio  in  x.  Bj0rnstad  (1981a) 
considered  the  goal  of  selecting  the  t  best  populations,  namely,  those 
associated  with  t  largest  e^’s.  Here  the  decision  space  consists  of  all 
subsets  of  {l,...,k}  of  size  _>  t.  Bj0rnstad  considered  nonnegative,  semi¬ 
additive  loss  functions  of  the  form  L(e,d)  =  a(  d1)  )  L.(e),  where  d  denotes 

i3d 

the  subset  selected  and  d  its  size.  Here  a(  d|)  >0  and  L.(e)  >  0.  Bj0rnstad 
(1981a)  has  shown  that,  under  certain  conditions,  the  procedure  that  selects 
the  t  populations  corresponding  to  the  t  largest  X^'s  (X^  is  an  observation 
from  Tr.j )  minimizes  the  risk  uniformly  in  0  in  the  class  of  permutation- 
invariant  procedures.  He  has  also  shown  a  class  of  likelihood-ratio  type 
procedures  to  be  admissible  for  monotone  loss  functions. 
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In  Section  5.1,  we  discussed  asymptotically  optimal  rules  of  Bickel 
and  Yahav  (1977)  for  selecting  the  best  population.  In  a  recent  paper, 

Bickel  and  Yahav  (1982)  showed  that  the  same  ru^s  also  minimize  the 
asymptotic  risk  for  a  wide  class  of  smooth  "monotone"  loss  functions  within 
the  class  of  procedures  with  PCS  bounded  below  by  a  specified  P*.  They 
also  showed  that  Gupta's  maximum  type  rule  with  P*  as  the  minimum  PCS  is 
'asymptotically  optimal  within  the  same  class  of  procedures  and  for  the  same 
class  of  loss  functions  for  essentially  any  prior  for  which  the  empirical 
d.f.  of  the  means  tends  to  a  fixed  d.f.  with  prior  probability  1,  and  whose 
essential  supremum  is  finite'. 

For  selecting  the  best  population  in  a  randomized  complete  block  design, 
Huang  and  Tseng  (1983)  have  obtained  r-minimax  procedures. 

Selecting  good  populations.  Let  it,,..., it.  be  normal  populations  with 

2 

unknown  means  e,,...,e.  and  a  common  known  variance  a  .  A  population  it,.  is 
said  to  be  good  if  e..  _>  e^-A,  A  >  0  given,  and  bad  otherwise.  Gupta  and 
Kim  (1981)  considered  the  loss  function 

L(£’d)  =  i|d  LB(0i’0[k]+A)  +  L  LG(9i'6[k]+A)’ 

where  d  is  the  selected  subset  of  {!,..., k},  Lg  is  nonincreasing,  is 
nondecreasing,  Lg(y)  =  0  for  y  >  0,  and  Lg(y)  =  0  for  y  <  0.  Assuming  that 
h  has  an  exchangeable  normal  prior,  Gupta  and  Kim  (1981)  have  shown  that 
Gupta's  maximum  type  rules  are  extended  Bayes.  For  a  simple  loss  function, 
they  have  made  Monte  Carlo  comparisons  of  the  maximum  type  and  the  average 
type  rules  with  the  Bayes  rule.  As  in  the  studies  of  Chernoff  and  Yahav 
(1977),  and  Gupta  and  Hsu  (1978),  the  study  of  Gupta  and  Kim  (1981)  indicate 
that  the  maximum  type  rules  perform  well. 
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Bj0rnstad  (1981b)  developed  a  theory  for  a  class  of  procedures, 

called  Schur  procedures,  and  applied  it  to  certain  minimax  problems.  Let 

tt  i , . . . ,  -n-^  have  densities  f(x-ei- )»  i  =  l,...,k.  We  assume  that  f(x-e)  has 

monotone  likelihood  ratio  in  x.  Given  the  observation  x  =  (x, , . . .  ,x,  ) ,  a 

selection  rule  is  defined  by  6  (x)  =  (6^  (x) , . . .  ^(x)) ,  where  6..(x)  denotes 

the  individual  selection  probability  for  tt  ^ ,  i  =  l,...,k.  We  consider  the 

class  c  of  just,  permutation-invariant  and  translation-invariant  procedures. 

Now,  6  6  C  if  and  only  if  there  exists  a  permutation-symmetric  function 
k-1 

6':  IR  -»■  IR  ,  which  is  nonincreasing  in  each  component  such  that  for 
every  i  6  - ( x)  =  6 1 (x,-x. ,. . . ,x.  ,-x. ,x. . n-x. , . . . ,x. -x. ) .  A  subset  selection 
procedure  <5  =  (6,,..., 6^)  is  said  to  be  a  Schur-procedure  if  6  6  C  and  6' 
is  a  Schur-concave  function.  Bj0rnstad  (1981b)  has  obtained  several 
properties  of  Schur  procedures.  Subject  to  controlling  the  minimum  expected 
number  of  good  populations  selected  or  the  probability  that  the  best 
population  is  in  the  selected  subset,  he  has  obtained  minimax  procedures 
using  the  criterion  of  minimizing  the  expected  number  of  bad  populations  (or 
a  similar  criterion). 

Selecting  good  populations  compared  to  a  control.  Let  be  the 

populations  that  are  compared  with  the  control  Hg.  Let  ir.  be  characterized 
by  e.j ,  i  =  0,1, ...,k.  Gupta  and  Hsiao  (1981)  considered  the  case  of  normal 
populations  with  unknown  means  0Q,0p...,0^  and  known  variances.  They  called 
a  population  ir.  good  if  |e.-0g|  £  A  and  bad  if  1 Q -j ” eo I  —  A+^’  w^ere  A  >  0  and 
€  >  0  are  specified  constants.  They  used  a  simple  additive  loss  function 
which  is  made  up  of  loss  L-j  for  every  good  population  that  is  not  selected, 
and  loss  L^  for  every  bad  population  that  is  included.  They  considered  both 
cases  of  6g  known  and  unknown  and  obtained  minimax,  r-minimax  and  Bayes 
rules.  Their  Bayes  rule  was  derived  under  a  normal  prior  for  e. 
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In  another  paper,  Gupta  and  Hsiao  (1983)  considered  uniform  distributions 


on  (0,0^),  i  -  0,1,. ..,k.  They  defined  a  population  ir.  to  be  good  if  >  Gq 


and  bad  otherwise.  They  considered  the  loss  function  L(e,s  )  =  L,  j  (e.-en) 

^  I  •Z.ry  1  U 


i€A 


+  L?  \  (en-e.),  where  s  denotes  the  selected  subset,  A  denotes  the  set  of 
c  ieB  u  1 

good  populations  that  are  not  selected,  and  B  denotes  the  set  of  bad  populations 
that  are  selected.  Gupta  and  Hsiao  (1983)  derived  empirical  Bayes  rules  in 
both  cases  of  Gq  known  and  unknown. 

Gupta  and  Leu (1983a)  also  considered  selection  from  uniform  distributions 
on  (0,0.),  i  =  0,1, ...,k.  But  they  defined  tt^  to  be  good  if  | e . ~©q I  <  A  and 
bad  otherwise.  They  derived  Bayes  and  empirical  Bayes  procedures  (both  0q 
known  and  unknown  cases)  using  a  loss  function  L(e.s)  given  by 


where  s  is  the  selected  subset,  c^ 's  are  positive  constants,  and  I  is  the 
indicator  function. 

6 . 2  Isotonic  Procedures 

In  this  section  as  well  as  in  the  previous  sections,  we  have  discussed 
the  problem  of  comparing  several  populations  with  a  control  and  the  contributions 
of  several  authors  in  this  respect.  It  has  been  assumed  by  these  authors  that 
there  is  no  information  about  the  ordering  of  the  unknown  parameters  G..  of 
these  populations.  In  some  situations,  we  may  have  partial  prior  information 
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in  the  form  of  a  simple  of  partial  order  relationship  among  the  unknown 
parameters.  This  is  typical,  for  example,  in  experiments  involving  different 
dose  levels  of  a  drug  where  the  treatment  effects  will  have  a  known  ordering. 

Let  tt-j  , . . .  ,TT^  be  the  k  populations  that  are  compared  with  the  control  population 
TrQ.  Let  Tri  have  distribution  F  ,  i  =  0,1,. ..,k.  Then  it  is  assumed  that  the 
0^  are  unknown,  but  it  is  known  that  6-j  <_  e ^  e^.  A  population  it.. 

(i  =  l,...,k)  is  defined  to  be  good  if  0^  >  Gq  and  bad  otherwise.  The  goal 
is  to  select  the  good  populations.  We  would  expect  any  reasonable  procedure 
R  to  have  the  property:  If  R  selects  then  it  selects  all  populations  it. 
for  j  >  i.  This  is  the  isotonic  behavior  of  the  procedure  R.  Naturally, 
one  would  propose  rules  based  on  isotonic  estimators  of  e^,...,e^.  Such 
procedures  have  been  recently  investigated  by  Gupta  and  Yang  (1981)  in  the 

p 

case  of  normal  means  (common  variance  a  may  be  known  or  unknown),  by  Gupta 
and  W.  T.  Huang  (1982)  in  the  case  of  binomial  populations  with  success 
probabilities  e.,  and  by  Gupta  and  Leu (1983b)  in  the  case  of  two-parameter 
exponential  populations  with  location  parameters  (guarantee  times)  e-  and  a 
common  (known  or  unknown)  scale  parameters.  All  these  authors  have  considered 
both  cases  of  known  and  unknown  6q . 

6 . 3  Locally  Optimal  Subset  Selection  Rules 

In  Section  5.4,  we  discussed  a  locally  optimal  subset  selection  rule  based 
on  ranks  given  by  Gupta,  Huang  and  Nagel  (1979).  Their  local  optimality 
criterion  involved  maximizing  the  PCS  in  a  neighborhood  of  an  equi -parameter 
point.  Locally  optimal  rules  involving  different  optimality  criteria  have 
been  recently  investigated  by  Huang  and  Panchapakesan  (1982b)and  Huang, 
Panchapakesan  and  Tseng  (1984). 
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Huang  and  Panchapakesan  ( 1982b) considered  two  goals,  namely,  selecting 
the  best  from  k  populations  1^,...,^,  and  selecting  from  those 

populations,  if  any,  that  are  better  than  Hg  which  is  the  control  population 
Let  iK  have  density  f(x,e^),  i  =  0,1, ...,k,  satisfying  certain  regularity 
conditions.  For  the  first  goal,  the  best  population  is  the  one  associated 
with  the  largest  among  6p...,e^.  For  the  second  goal,  tt ^  is  defined  to  be 

better  than  -ttq  if  (h  >  and  inferior  otherwise.  As  in  the  paper  of  Gupta, 

Huang  and  Nagel  (1979),  it  is  assumed  that  the  functional  form  of  f(x,o)  is 
known.  For  selecting  the  best  population,  Huang  and  Panchapakesan  (1982b) 
derived  a  permutation-invariant  rule  6  such  that  inf  P  ( CS  1 6 )  =  P*  where 

“o 

%  =  el  =  -  =  °k}  '  Their  rule  is  locally  optimal  in  the  sense  that  it 

is  strongly  monotone  in  a  neighborhood  of  any  point  6g  in  For  selecting 

populations  better  than  a  standard,  it  is  assumed  that  eQ  is  unknown  but 

e0  i  eo’  a  known  quantity.  Huang  and  Panchapakesan  considered  the  class  of 
rules  6  for  which 

P^^.  is  selected |  e  e  qq)  <  y,  i  =  1 .  ,k, 

and  obtained  a  locally  optimal  rule  in  this  class  which  is  optimal  in  the 
sense  that  it  maximizes 


k 

l 


-l-T  - 

1-1  1 


ywi  is  selectedloj  =  e*  <  e. ,  j  f  1)  |e  * 


The  above  criterion  of  local  optimality  is  related  to  the  ability  of  a  rule 
in  choosing  a  population  which  is  ‘distinctly  better1  than  the  control  while 
all  others  are  not. 

Huang,  Panchapakesan  and  Tseng  (1984)  obtained  a  locally  optimal  rule 
for  selecting  populations  better  than  a  control.  Their  rule  is  not  based  on 
ranks  but  on  statistics  T^g  suitably  defined  to  indicate  the  difference 
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between  ^  and  ttq,  i  =  Also,  their  procedure  does  not  require  equal 

sample  sizes.  They  considered  the  class  of  rules  for  which  the  probability  of 
selecting  when  =...=  9^  =  6q  is  fixed  at  the  level  ,  i  =  l,...,k. 

The  local  optimality  criterion  used  by  them  amounts  to  maximizing  the 
efficiency  in  a  certain  sense  of  the  rule  in  picking  out  the  superior 
populations  in  the  direction  of  each  being  superior  while  all  others  are 
identical  to  the  control.  Huang,  Panchapakesan  and  Tseng  have  applied  their 
general  results  to  the  following  special  cases:  (a)  normal  means  comparison  - 
common  known  variance,  (b)  normal  means  comparison  -  common  unknown  variance, 
(c)  gamma  scale  parameters  comparison  -  known  (unequal)  shape  parameters, 
and  (d)  regression  slopes. 

6 . 4  Two-Stage  and  Sequential  Rules 

In  Section  5.7,  we  descirbed  a  two-stage  procedure  (r29)  of  Tamhane 
and  Bechhofer  (1977)  where  the  first  stage  involves  a  subset  selection  rule 
employed  to  eliminate  inferior  populations.  Such  rules  have  been  called 
elimination  type  rules.  We  also  noted  the  difficulty  in  establishing  the 
LFC  when  k  >  3.  Consider  the  problem  and  the  goal  of  Tamhane  and  Bechhofer 
(1977).  They  used  Gupta's  maximum  type  rule  for  screening  based  on  the 
first  stage  sample.  Let  us  call  this  procedure  S^ .  Gupta  and  Miescke 
(1982)  considered  S-j  and  two  other  screening  procedures  S^  and  S^. 
retains  populations  that  yield  the  t  largest  ,  2  <_  t  <_  k-1,  t  fixed,  and 
S3  retains  it.  if  and  only  if  >_  c^ ,  i  =  l,...,k.  Let  d-j  and  denote 
two  decision  rules  at  the  second  stage  both  choosing  the  population  that 
yielded  the  largest  sample  mean,  d-|  using  only  the  second  stage  samples  and 
d^  using  combined  samples  of  both  stages.  The  Tamhane  -  Bechhofer  rule  corre¬ 
sponds  to  (S i , ) .  Gupta  and  Miescke  (1982)  proved  that  for  (S^jd  ), 
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cx  =  2,3;  6  =  1,2;  the  LFC  in  is  the  slippage  configuration  *  given  by 
0[1]  =...=  0[k_-]]  =  e[|c]-«5*.  For  (S-j  ,d-j ) ,  they  have  obtained  a  lower  bound 
for  the  PCS  which  is  minimized  over  at  e5*,  a  result  similar  to  that  of 
Tamhane  and  Bechhofer  (1979). 

Gupta  and  Kim  (1982)  proposed  a  two-stage  elimination  type  procedure 
for  the  normal  means  problem  when  the  common  variance  is  unknown.  It 
should  be  noted  that  for  this  problem,  Tamhane  (1976)  and  Hochberg  and 
Marcus  (1981)  proposed  three-stage  procedures  as  pointed  out  in  Section  5.7. 
Gupta  and  Kim  gave  a  new  design  criterion  and  obtained  a  sharp  lower  bound 
on  the  PCS. 

For  the  normal  means  problem  (common  known  variance),  Gupta  and  Miescke 
(1984a)  studied  two-stage  procedures  with  screening  at  the  first  stage  using  a 
Bayesian  approach.  The  sampling  is  done  as  in  the  case  of  the  procedure  R?g. 
Under  the  assumption  of  a  loss  function  which  includes  the  cost  of  sampling, 
they  derived  a  Bayes  procedure  with  respect  to  i.i.d.  normal  priors. 

In  another  paper,  Gupta  and  Miescke  (1984b)  studied  permutation-invariant 
sequential  procedures  for  selection  from  n-j , . . .  , tt ^  belonging  to  an 
exponential  family,  with  associated  parameters  e^,...,ok,  respectively.  For 
the  class  of  procedures  considered,  at  stage  m  (m  =  1,2,...)  observations  are 
drawn  from  all  eligible  populations  at  that  stage.  Then  the  procedure  either 
stops  and  makes  a  final  subset  selection  from  the  eligible  populations,  or  it 
selects  a  subset  from  the  eligible  populations  and  proceeds  to  stage  m+1 .  Under 
a  general  loss  structure  (favoring  large  parameters),  Gupta  and  Miescke  (1984b) 
have  shown  that  at  all  stages  the  finally  selected  subsets  have  to  be  associated 
with  the  largest  sufficient  statistics  from  the  eligible  populations.  In  fact, 
these  natural  final  decisions  have  been  proved  to  be  uniformly  optimum  in 
terms  of  the  associated  risk. 
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For  a  survey  of  these  and  other  multi-stage  selection  procedures, 
reference  may  be  made  to  Miescke  (1982). 

6 . 5  Other  Developments 

In  Section  5.7,  we  discussed  the  problem  of  selecting  a  set  of  good 
predictor  variables.  In  the  formulation  of  Arvesen  and  McCabe  (1975)  only 
reduced  models  involving  r  (fixed)  out  of  p  independent  variables  were 
considered.  Huang  and  Panchapakesan  (1982a) formulated  the  problem  as  one 
of  eliminating  all  i nferior  models (compared  with  the  full  model  which  is 
called  the  true  model)  using  the  criterion  of  expected  residual  sum  of  squares 
to  define  inferior  models.  Hsu  and  Huang  (1982)  investigated  a  sequential 
procedure  for  selecting  good  regression  models.  Recently,  Gupta,  Huang 
and  Chang  (1984)  have  discussed  selection  of  an  optimal  subset  of  predictor 
variables  using  the  criterion  of  expected  residual  mean  squares  to  define 
inferior  model.  Their  treatment  of  the  problem  differs  from  that  of  earlier 
papers  in  the  sense  that  they  use  simultaneous  tests  of  a  family  of  hypotheses. 

In  Section  3,  we  defined  a  rule  which  is  really  a  class  of  rules  based 
on  contrasts.  Let  C  denote  this  class.  The  procedure  R^  can  select  an  empty 
subset  unless  P*  is  suitably  (depending  on  k,  c-j , . . .  ,C|<_-j )  large.  Let  C+ 
denote  this  subclass  of  rules  that  select  a  nonempty  subset.  Deely  and  Gupta 
(1968)  showed  that,  for  the  normal  means  problem  (common  known  variance),  the 
rule  of  R,  of  Gupta  (1956)  is  optimal  (in  the  sense  of  minimizing  the  expected 
subset  size)  in  the  class  C+  when  the  means  are  in  a  slippage  configuration 
(e, ... ,6,0+5) ,  6  >  0,  if  Jn  5  is  greater  than  a  constant  depending  on  k  and 
P*  only.  This  result  (which  is  essentially  amounts  to  considering  k  =  2,  if 
we  consider  all  configurations)  was  extended  by  Gupta  and  Miescke  (1981)  to 
k  =  3  normal  populations.  They  proved  the  following  result:  Let 
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P*  €  (0,1)  (P*  <E  [2/3,1))  and  n  be  fixed.  Then  is  optimal  within  C 

(C+)  at  every  configuration  9  =  (e^e^Qj)  such  that  e[3]-0[-|]  is  'sufficiently' 

large. 


Gupta  and  W.  T.  Huang  (1981)  presented  a  survey  of  results  on  mixtures 
of  distributions  and  considered  selection  from  -rr ^ .  ,-rrk  where  i^.  has  the 
m 


density  ^.(x)  =  J^a^.g^x),  where  g^x),  j 


l,...,m,  are  known  densities 
m 


and  ot  -  -j , . . .  ,aim  are  unknown  constants  in  (0,1)  such  that  £  a..  =  1,  i  =  l,...,k. 

j=l 

They  considered  several  procedures  for  selecting  the  population  associated  with 
the  largest  Ai  where  =  A(an . .  ,aim) ,  i  =  l,...,k. 

Bj0rnstad  (1983)  considered  a  class  of  estimators  called  expansion 
estimators  to  be  used  in  defining  a  subset  selection  procedure.  These 
estimators  of  the  population  parameters  are  obtained  by  employing  preliminary 
multiple  comparisons  procedures,  and  they  tend  to  expand  the  differences 
between  them,  compared  to  the  usual  estimators.  Bj0rnstad  (  1983)  has  studied 
a  class  of  maximum  type  procedure  based  on  these  expansion  estimators. 

Dudewicz  and  Taneja  (1981)  considered  the  problem  of  selecting  the  best 
multivariate  populations  when  the  comparison  of  populations  is  made  through  a 
preference  function  g  of  the  mean  vectors. 

Brostrom  (1981)  investigated  a  technique,  called  sequential  rejection, 
for  selection  procedures.  This  technique  is  applicable  to  "all  or  nothing" 
type  goals,  such  as  selecting  a  subset  containing  all  good  populations,  or 
selecting  a  subset  containing  no  bad  populations.  Chotai  (1980a, b)  has 
discussed  several  procedures  based  on  likelihood  ratio. 

In  related  developments,  Hsu  (1981b)  obtained  parametric  and  nonparametric 
simultaneous  confidence  intervals  for  all  distances  from  the  best  under 
the  location  model.  In  the  parametric  case,  he  has  shown  that  these  intervals 
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represent  a  substantial  strengthening  of  the  probability  statements  associated 
with  the  procedures  of  Bechhofer  (1954)  and  Gupta  (1956,  1965).  Jeyaratnam 
and  Panchapakesan  (1984)  have  discussed  the  problem  of  estimating  after 
selection  using  the  subset  selection  rule  of  Gupta  (1956)  for  k  =  2  normal 
populations  with  common  known  variance. 

7.  IMPACT  OF  DEVELOPMENTS  AND  FUTURE  -  AN  ASSESSMENT 

In  the  preceding  sections,  we  have  attempted  to  provide  an  introduction 
to  the  beginnings  of  the  ranking  and  selection  theory  and  to  trace  the  develop¬ 
ments  in  the  subset  selection  theory  that  took  place  since  then.  The  literature 
on  the  subject  of  ranking  and  selection  on  the  whole  has  grown  enormously 
over  these  years,  thanks  to  vigorous  pursuit  of  many  research  workers.  The 
research  workers  in  this  field  are  no  more  confined  to  a  few  schools  or  a 
few  geographical  parts  of  the  world.  It  serves  well  to  take  a  look  at  the 
accomplishments  of  the  past  in  order  to  visualize  the  potential  needs  of 
the  future.  Our  general  assessment  here  is  not  confined  to  subset  selection 
alone  but  to  the  ranking  and  selection  field  as  a  whole. 

In  tracing  the  beginnings  of  what  are  now  called  selection  and  ranking 
procedures  we  referred  to  the  indifference  zone  and  subset  selection  approaches. 
The  inadequacies  of  the  tests  of  homogeneity  and  the  multiple  comparisons 
techniques  to  answer  the  concerns  of  the  experimenter  regarding  the  best 
population  or  subset  of  best  populations  had  been  recognized  by  the  late 
forties.  The  slippage  problems  of  Mosteller  (1948)  and  Paulson  (1952) 
were  early  efforts  in  the  direction  of  more  meaningful  formulations.  The 
early  papers  dealing  with  choosing  the  best  population  were  Bahadur  (1950) 
and  Bahadur  and  Robbins  (1950).  The  stage  was  set  for  the  development  of 
the  field  of  selection  and  ranking  when  Bechhofer  published  his  now  classical 
1954  paper. 
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Some  of  the  early  applications  of  selection  and  ranking  procedures  were 
the  area  of  animal  science  and  agriculture.  A  few  papers  worth  noting  in 
this  respect  are  Becker  (1961,  1963),  and  Putter  and  Soller  (1964,  1965). 
Several  papers  have  appeared  dealing  with  applications  to  tournaments,  traffic 
fatality  data,  system  performance  evaluation,  accounting,  reliability  models, 
education  and  psychology.  A  list  of  papers  with  such  specific  applications 
is  given  the  recent  categorized  bibliography  of  Dudewicz  and  Koo  (1982, 
pp.  88-92).  A  bibliography  of  applications  is  also  given  in  Gibbons,  Olkin 
and  Sobel  (1977).  For  some  papers  advocating  selection  and  ranking  in 
practice,  see  Chew  (1977a, b). 

Although  tables  needed  to  implement  the  procedures  were  available  in 
many  papers  starting  with  Bechhofer  (1954),  the  application  of  the  theory 
by  the  user  had  been  rather  slow.  Part  of  this  problem  until  recently 
was  due  to  a  lack  of  books  bringing  in  the  techniques  and  results  in  an 
easily  accessible  way  for  users  at  various  levels.  The  monograph  of 
Bechhofer,  Kiefer  and  Sobel  (1968)  was  the  first  book  to  deal  exclusively 
with  the  subject.  Though  written  for  the  theoretician  and  the  practitioner, 
the  book  represents  significant  contributions  of  the  authors  in  respect  of 
sequential  procedures  and  thereby  perhaps  is  accessible  only  to  sophisticated 
users.  Also,  the  period  1965  -  1975  was  the  period  of  main  growth  of  the 
field  and  as  such  it  would  have  been  rather  premature  for  a  methods-oriented 
book  for  the  practitioner  or  a  comprehensive  book  on  the  developments.  So 
it  was  not  until  the  late  seventies  that  the  next  two  books  fulfilling  these 
objectives  came  out.  The  book  of  Gibbons,  Olkin  and  Sobel  (1977)  brings  the 
basic  methodology  (mostly  using  the  indifference  zone  approach)  to  the 
practitioners  though  not  written  solely  for  them.  The  text  of  Gupta  and 
Panchapakesan  (1979)  provides  a  comprehensive  survey  of  the  developments  in 
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all  aspects  of  the  theory  with  a  special  chapter  on  Guide  to  Tables. 

These  are  followed  by  the  books  of  Gupta  and  D.  Y.  Huang  (1981)  and  Buringer, 
Martin  and  Schriever  (1980).  Besides  these,  the  text  of  Dudewicz  (1976) 
devotes  a  chapter  to  selection  and  ranking,  and  the  book  of  Kleijnen  (1975) 
refers  to  the  uses  of  several  selection  procedures  with  regard  to 
simulation.  In  addition  to  these,  several  expository  articles  have  appeared 
in  journals  from  time  to  time;  these  are  either  overviews  of  the  subject  or 
surveys  of  developments  dealing  with  certain  aspects  of  the  theory.  A 
special  issue  of  Communications  in  Statistics  -  Theory  and  Methods  (Volume 
A6,  Number  11)  was  devoted  to  selection  and  ranking  procedures. 

The  books  and  special  issues  mentioned  above  have  certainly  contributed 
to  further  developments  in  the  theory.  Equally  important  are  the  constant 
forums  for  exchange  and  dissemination  of  ideas  provide  by  special  meetings 
and  workshops.  In  this  respect,  special  mention  should  be  made  of  the 
special  course  on  selection  and  ranking  under  the  auspices  of  The  American 
Statistical  Association  during  its  annual  meeting  in  1979  at  Washington,  D.C., 
and  a  similar  special  course  organized  at  Naval  Postgraduate  School.  The 
1 ecturers  in  these  two  short  courses  were:  Bechhofer,  Gibbons,  Gupta,  and 
01  kin.  Also  to  be  noted  is  the  special  advanced  workshop  held  in  Summer,  1982 
organized  by  Dudewicz,  in  Honolulu,  Hawaii. 

In  this  connection,  mention  should  be  made  of  the  proceedings  of  three 
symposia  held  at  Purdue  in  1970,  1976  and  1981.  These  are  published  under 
the  title  Statistical  Decision  Theory  and  Related  Topics.  In  each  of  the 
three  volumes,  there  are  quite  a  few  papers  dealing  with  selection  and  ranking 
These  activities  described  above  have  brought  the  developments  in  the  field 
to  the  attention  of  research  workers  in  industry,  government  and  academia. 


62 


Although  for  several  standard  situations,  tables  are  available  in  the 
original  papers  and  in  the  book  of  Gibbons,  01  kin  and  Sobel  (1977),  it  is 
desirable  to  develop  computer  packages  for  applications.  Some  packages  have 
recently  been  developed  by  Jason  Hsu  in  cooperation  with  S.  S.  Gupta. 

Considering  the  fact  that  many  of  the  activities  that  we  have  described 
in  the  preceding  paragraphs  took  place  within  the  last  five  years,  it  is 
perhaps  too  early  to  be  pessimistic  about  the  absence  of  dramatic  change  in 
the  attitudes  of  the  users.  The  major  hurdle,  if  we  may  call  it  so,  in 
adopting  the  selection  and  ranking  formulation  lies,  on  the  part  of  many 
applied  statisticians,  in  giving  up  the  testing  of  a  'null  hypothesis'. 

Finally,  as  our  review  would  indicate,  there  are  several  situations  in 
which  more  satisfactory  solutions  are  needed.  Some  of  the  areas  where  not 
much  has  been  done  are  multivariate  problems,  reliability  models,  and 
selection  of  the  best  predictor  variables.  Little  attention  has  been  paid 
to  problems  that  arise  with  regard  to  model  selection,  time  series  data  and 
signals  in  communications. 
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